Risk Measurement & Pricing

This page brings together three strands of quantitative work that share a common goal: turning the delegation-risk framework’s qualitative apparatus into something you can measure, compose, and price.

Compositional risk measures — the mathematical foundations (coherent risk measures, Euler allocation, Markov categories, linear logic, spectral measures) for risk measures that compose properly across AI system components.
Complexity pricing — how to price the epistemic uncertainty that comes from structural complexity rather than from the underlying risk itself.
Alignment tax quantification — empirical measurement of the performance costs of AI safety mechanisms, and frameworks for deciding when those costs are justified.

Part I — Compositional Risk Measures

Mathematical foundations for risk measures that compose properly across AI system components.

Coherent Risk Measures (Artzner et al. 1999)

The Four Axioms

A risk measure ρ is coherent if it satisfies:

Axiom	Formula	Meaning
Monotonicity	X ≤ Y a.s. ⟹ ρ(X) ≤ ρ(Y)	Lower losses = lower risk
Translation Invariance	ρ(X + α) = ρ(X) − α	Adding cash reduces risk by that amount
Positive Homogeneity	ρ(λX) = λρ(X) for λ ≥ 0	Double position = double risk
Subadditivity	ρ(X + Y) ≤ ρ(X) + ρ(Y)	Diversification doesn’t increase risk

Subadditivity is the most critical axiom - it captures “a merger does not create extra risk.”

Why VaR Fails Subadditivity

VaR can violate subadditivity with heavy-tailed distributions:

Bond portfolio example:

Concentrated portfolio (100 units of 1 bond): VaR₀.₉₅ = −500 (a gain!)
Diversified portfolio (1 unit each of 100 bonds): VaR₀.₉₅ = 25

VaR says diversified portfolio is riskier - contradicts financial intuition.

General result: VaR is subadditive for normal distributions but fails for heavy tails.

Why Expected Shortfall is Coherent

Expected Shortfall (ES/CVaR): Average loss in the worst (1-α)% of cases.

ES_α(X) = E[X | X ≥ VaR_α(X)]

ES satisfies all four axioms. It overcomes VaR’s limitations by:

Providing information about losses beyond VaR threshold
Averaging tail losses rather than focusing on single quantile
Encouraging diversification through subadditivity

Regulatory adoption: Basel III shifted from VaR to ES at 97.5%.

The Representation Theorem

Any coherent risk measure can be written as:

ρ(X) = sup_{Q ∈ P} E_Q[−X]

where P is a set of probability measures (the “risk envelope”).

Interpretation: Coherent risk = worst-case expected loss across multiple probability models.

Euler’s Theorem and Risk Allocation

Homogeneous Functions

A function f is homogeneous of degree k if:

f(cx) = c^k · f(x) for any c > 0

For degree 1: f(cx) = c·f(x) (linear scaling).

Euler’s Theorem

For homogeneous functions of degree k:

k·f(x) = Σᵢ xᵢ·(∂f/∂xᵢ)

For degree 1 risk measures:

RM(x) = Σᵢ xᵢ·(∂RM/∂xᵢ)

Why This Enables Full Allocation

The risk contribution of component i is:

RC_i = x_i · (∂RM/∂x_i)

Key property: Σᵢ RC_i = RM (full allocation)

This means component risk contributions sum exactly to total risk - no gaps, no waste.

Key Equivalences

Property	Risk Measure Property
Positive homogeneity	Full allocation
Subadditivity	”No undercut”
Translation invariance	Riskless allocation

Critical result: Full allocation and RORAC compatibility hold simultaneously if and only if risk measure is homogeneous of degree 1.

Risk Measures with This Property

Portfolio standard deviation σ_p(x)
Value-at-Risk VaR
Expected Shortfall ES
All spectral risk measures
All coherent risk measures

Compositional Properties

Types of Composition

Type	Formula	When It Applies
Additive	R_total = Σ R_i	Independent linear risks
Multiplicative	R_total = Π R_i	Series reliability
Sub-additive	R_total < Σ R_i	Diversification benefit
Super-additive	R_total > Σ R_i	Common-cause failures

Fault Tree Logic

OR Gate: Output fails if ANY input fails

P(failure) ≈ Σp_i for small probabilities
Models “parallel in failure space”

AND Gate: Output fails only if ALL inputs fail

P(failure) = Πp_i
Models redundancy success

Assume-Guarantee Contracts

Contract structure: (Assumptions, Guarantees)

Assumptions: Properties environment must satisfy
Guarantees: Properties component delivers under assumptions

Compositional verification: Verify component contracts individually, compose to prove system properties.

Recent tools: Pacti for efficient contract computation, AGREE for architectural verification.

Markov Categories

Fritz (2020) Framework

Markov categories provide categorical foundations for compositional probability.

Core idea: A symmetric monoidal category where morphisms behave like “random functions.”

Morphisms as Probabilistic Processes

Morphism	Interpretation
p : 1 → X	Probability distribution on X
k : X → Y	Markov kernel / channel
Δ_X : X → X ⊗ X	Copy information
!_X : X → I	Delete information

Composition Operations

Sequential: f : X → Y and g : Y → Z compose to g ∘ f : X → Z

Parallel: f : X → Y and g : W → Z compose to f ⊗ g : X ⊗ W → Y ⊗ Z

Relevance to AI Safety

Markov categories provide:

Rigorous foundations for composing probabilistic AI components
Formal treatment of conditional independence
Comparison of statistical experiments
Foundation for probabilistic contracts between components

Linear Logic and Resource Tracking

Girard (1987) Linear Logic

Key distinction: Disallows contraction and weakening for unmarked formulas.

Resource interpretation:

Proposition = resource
Proof = process consuming resources
Each assumption used exactly once

Linear Types

Values of linear type must be used exactly once:

No implicit copying (contraction forbidden)
No implicit deletion (weakening forbidden)

Applications: Memory management, file handles, cryptographic keys, protocol states.

Bounded Linear Types

Extension: Instead of “use exactly once,” allow “use at most k times.”

QBAL (Quantified Bounded Affine Logic): Quantification over resource variables, preserving polynomial time soundness.

Object Capabilities

Capability: Unforgeable token granting authority to perform operations.

Security: Objects interact only via messages on references. Security relies on not being able to forge references.

Languages: E, Joe-E, Pony, Cadence, Austral.

Enforcing Risk Budgets

Combining linear types with capabilities:

type RiskBudget[R: Real] = LinearCapability[R]

fn risky_operation(budget: RiskBudget[0.1]) -> Result

Type system ensures:

Budget can’t be duplicated (linear)
Total risk ≤ sum of allocated budgets (conservation)
Risk-taking operations require explicit budget allocation

Spectral Risk Measures

Definition

Spectral risk measure: Weighted average of loss quantiles.

M_φ(X) = ∫₀¹ φ(p) · q(p) dp

where φ(p) is the risk aversion function and q(p) is the quantile function.

Coherence Conditions

φ must satisfy:

Positivity: φ(p) ≥ 0
Normalization: ∫₀¹ φ(p) dp = 1
Increasingness: φ non-decreasing (φ’(p) ≥ 0)

Condition 3 reflects risk aversion - weight higher losses at least as much as lower losses.

Relationship to Expected Shortfall

ES_α is a spectral measure with:

φ(p) = 1/(1-α) if p ≥ α, else 0

Uniform weight on worst (1-α)% of outcomes.

General result: Any spectral measure = positively weighted average of ES at different levels.

Kusuoka Representation

Any law-invariant coherent risk measure can be written as:

ρ(X) = sup_{μ ∈ M} ∫₀¹ ES_α(X) dμ(α)

Practical Questions for AI Safety

Can We Define Coherent AI Harm Measures?

Challenges:

Measurement difficulties for AI catastrophic risk
Collective risk: multiple “safe” models may collectively exceed thresholds
Discontinuous behavior near critical parameters

Promising directions:

Harm as loss distribution: Define harm H, use ρ(H) for coherent measure
Spectral measures: Heavily weight catastrophic tail events
ES for AI: ES_α(Harm) = average harm in worst α% scenarios

Critical requirement: Subadditivity must reflect reality. If combining AI systems creates emergent super-additive risks, coherent measures need modification.

Systematic vs Random Failures

Systematic risk (common cause):

All AI systems fail for same reason
Shared training data, common vulnerabilities
Does not diversify away

Random failures (idiosyncratic):

Independent across systems
Diversification helps
Subadditivity applies

Handling approaches:

Separate risk components (systematic + idiosyncratic)
Use convex measures for systematic (relax subadditivity)
Worst-case scenario sets including correlated failures
Fault tree analysis with explicit common-cause modeling

When Does Diversification Help?

Helps when:

Failures are independent
Risk measure is subadditive
Normal or light-tailed distributions
No common-cause vulnerabilities

Hurts when:

Systematic risk dominates
Heavy-tailed distributions
Risk monoculture (all systems use same approach)
Increased attack surface from complexity
Series reliability (all must succeed)

AI-specific:

Helps: Diverse training data, multiple architectures, ensemble methods
Hurts: Oligopolistic AI vendors create illusory diversification, sophisticated adversaries exploit weakest component

The compositional machinery above assumes you can decompose a system’s risk into components whose contributions sum cleanly. Part II turns to what happens when that assumption is shaky — when structural complexity makes the decomposition itself uncertain — and how to price that uncertainty.

Part II — Pricing Structural Complexity

How do you price uncertainty that comes from structural complexity rather than from the underlying risk itself?

The Core Problem

Standard risk pricing assumes you can estimate probability distributions for losses. But complex organizational structures create epistemic uncertainty—uncertainty about your uncertainty.

Risk Type	What You Know	How to Price
Simple risk	Distribution of outcomes	Expected loss + margin
Parameter uncertainty	Distribution family, uncertain parameters	Bayesian methods, wider intervals
Structural complexity	Unknown interactions, hidden dependencies	???

Complexity doesn’t change the expected value of losses—it changes how confident you can be in that estimate.

Approaches from Other Fields

Insurance: Loading Factors

Traditional insurance uses loading factors to account for uncertainty:

Premium = Expected Loss × (1 + Loading Factor)

Source of Loading	Typical Factor
Administrative costs	10-20%
Profit margin	5-15%
Parameter uncertainty	10-30%
Model uncertainty	20-50%
Structural complexity	50-300%+

The problem: loading factors are often set by judgment rather than systematic analysis.

Reinsurance: Credibility Theory

Credibility theory balances individual experience against class statistics:

Credibility-weighted estimate = Z × (individual estimate) + (1-Z) × (class estimate)

where Z (credibility factor) depends on:

Volume of individual data
Variance in individual experience
Variance in class experience

Application to complexity: High-complexity organizations have low credibility (Z → 0) because their structure makes historical data less predictive. They’re priced closer to worst-case class estimates.

Operations Research: Robust Optimization

Robust optimization handles uncertainty sets rather than point estimates:

Minimize worst-case cost over all scenarios in uncertainty set U

For complexity pricing:

Larger uncertainty sets for complex structures
Premium reflects worst case within the set
Complexity score determines set size

Key insight: You’re not pricing expected loss—you’re pricing the worst plausible loss given structural uncertainty.

Software Engineering: Cyclomatic Complexity

McCabe (1976) defined cyclomatic complexity as the number of linearly independent paths through code:

V(G) = E - N + 2P

where E = edges, N = nodes, P = connected components.

Higher complexity correlates with:

More defects
Harder to test
Higher maintenance costs

Analogy for organizations: Delegation structures have “paths” through authority chains. More paths = harder to predict behavior.

A Framework for Complexity Scoring

Structural Factors

Factor	Contribution	Rationale
Delegation depth	+2 per layer	Each layer adds interpretation variance
Parallel delegates	+1 per delegate	Coordination uncertainty
Shared resources	+2 per shared resource	Contention and priority conflicts
Informal relationships	+3 per relationship	Undocumented authority creates surprises
Cross-layer dependencies	+2 per dependency	Bypasses normal authority flow
Ambiguous boundaries	+3 per boundary	Unclear who’s responsible
External dependencies	+2 per dependency	Less control, less visibility

Dynamic Factors

Static structure isn’t everything. Some complexity emerges from dynamics:

Factor	Contribution	Rationale
High turnover	+2	Institutional knowledge loss
Rapid growth	+3	Structure lags reality
Recent reorganization	+2	Transition period uncertainty
Multi-jurisdictional	+2 per jurisdiction	Different rules, enforcement

Information Factors

Factor	Contribution	Rationale
Poor documentation	+3	Can’t verify claims about structure
Audit findings	+1 per finding	Evidence of hidden issues
Information asymmetry	+2	Principal can’t observe agent
Opacity of decision-making	+3	Can’t attribute outcomes to causes

From Complexity Score to Uncertainty Multiplier

Empirical Calibration Approach

If we had data on organizational failures vs. complexity scores, we could fit:

Uncertainty Multiplier = f(Complexity Score)

Proposed functional form (requires empirical validation):

σ_multiplier = e^(0.15 × complexity_score)

Complexity Score	Uncertainty Multiplier
2	1.35× (±35%)
5	2.1× (±110%)
10	4.5× (±350%)
15	9.5× (±850%)
20	20× (±1900%)

Theoretical Justification

Why exponential? Each complexity factor potentially:

Introduces new failure modes (additive)
Creates interactions with existing factors (multiplicative)

If interactions dominate, we expect multiplicative (exponential) growth in uncertainty.

Confidence Interval Pricing

Given uncertainty multiplier σ_m:

Premium = Expected Loss × σ_m × Safety Factor

The safety factor depends on insurer risk tolerance and capital requirements.

Pricing Models

Model 1: Upper Bound Pricing

Price at the upper end of the confidence interval:

Premium = E[Loss] × (1 + k × σ_multiplier)

where k reflects how many standard deviations to cover (typically 1-3).

Advantage: Simple, conservative. Disadvantage: May be too expensive for high complexity.

Model 2: Variance Loading

Load premium proportional to variance:

Premium = E[Loss] + λ × Var[Loss]

For complexity: Var[Loss] = Base_Var × σ_multiplier²

Advantage: Standard actuarial practice. Disadvantage: Requires variance estimate.

Model 3: Expected Shortfall

Use Expected Shortfall (CVaR) at a high confidence level:

Premium = ES_α(Loss) where α depends on complexity

Complexity Score	α Level	Interpretation
Low (< 5)	95%	Standard tail risk
Medium (5-10)	99%	More conservative
High (10-15)	99.9%	Near worst-case
Very High (> 15)	99.99%	Extreme caution

Advantage: Coherent risk measure, tail-focused. Disadvantage: Requires distribution assumptions.

(See Part I for why Expected Shortfall is a coherent risk measure and how it relates to spectral measures.)

Model 4: Ambiguity Pricing

Use robust optimization over uncertainty sets:

Premium = max_{θ ∈ Θ(complexity)} E_θ[Loss]

where Θ(complexity) is the set of plausible models given complexity level.

Advantage: No distribution assumptions. Disadvantage: Defining Θ is challenging.

Practical Implementation

Step 1: Assess Complexity Score

Structured questionnaire covering:

Organization chart analysis
Process documentation review
Interview key personnel
Audit report review
External dependency mapping

Step 2: Map to Uncertainty

Use calibrated function (initially judgment-based, refined with data):

if complexity_score < 5:
    uncertainty = "low"
    multiplier = 1.5
elif complexity_score < 10:
    uncertainty = "medium"
    multiplier = 3.0
elif complexity_score < 15:
    uncertainty = "high"
    multiplier = 7.0
else:
    uncertainty = "very high"
    multiplier = 15.0+

Step 3: Price with Multiplier

Premium = Base_Premium × multiplier

where Base_Premium is the rate for a simple, well-understood structure.

Step 4: Offer Complexity Reduction Incentives

Show client how premium changes with structural changes:

Change	Complexity Δ	Premium Δ
Document informal relationships	-3	-15%
Eliminate shared resources	-2	-10%
Add clear authority boundaries	-3	-15%
Reduce to 2 layers	-2	-10%

Validation Challenges

Limited Data

Organizational failures linked to complexity are rare and idiosyncratic. Building a training dataset is difficult.

Survivorship Bias

We observe organizations that haven’t catastrophically failed. High-complexity survivors may have unobserved mitigating factors.

Confounders

Complex organizations may differ from simple ones in ways that affect risk independently of complexity.

Proposed Validation Approaches

Historical case studies: Analyze past organizational failures for complexity factors
Simulation: Agent-based models of delegation chains under stress
Expert elicitation: Structured surveys of risk professionals
Regulatory data: Insurance claim data by organizational characteristics
Near-miss analysis: Study incidents that almost became failures

Connection to Other Frameworks

Delegation Accounting

Complexity pricing extends the delegation accounting framework by quantifying the uncertainty term:

Net Delegation Value = Receivable - (Exposure × σ_multiplier) - Costs

The complexity multiplier captures how much to discount the exposure estimate based on structural uncertainty.

Pattern Interconnection

Pattern interconnection creates complexity when defense patterns share failure modes. Complexity pricing provides a method to quantify this “correlation tax.”

Compositional Risk Measures

The compositional risk measures in Part I assume you can decompose system risk. Complexity represents the degree to which this decomposition fails—interactions that can’t be captured by analyzing components separately.

Open Questions (Complexity Pricing)

Optimal complexity scoring: Which factors matter most? How should they be weighted?
Functional form: Is the uncertainty multiplier truly exponential? Could it be polynomial, logistic, or stepped?
Industry variation: Do complexity factors have different impacts in different domains?
Dynamic complexity: How do you price complexity that changes over time?
Complexity reduction ROI: What’s the most cost-effective way to reduce complexity for insurance purposes?
AI-specific factors: What additional complexity factors apply to AI systems (emergent behavior, capability uncertainty, alignment uncertainty)?

Part III — Alignment Tax Quantification

Measuring the performance costs of AI safety mechanisms.

Parts I and II priced the risk of delegation and the uncertainty surrounding that risk. But the safety mechanisms that reduce risk are not free: they impose their own performance, compute, and capability costs. Part III turns to measuring that cost directly — the “alignment tax” — which is what determines whether a given mitigation is worth deploying.

The “alignment tax” represents the performance degradation, computational overhead, and capability limitations imposed by safety mechanisms in AI systems. As AI safety becomes increasingly critical for deployment, understanding and quantifying these costs is essential for making informed trade-offs between safety and performance. This part examines empirical measurements, cost categories, and frameworks for systematic alignment tax quantification.

1. Defining Alignment Tax

1.1 Core Concept

The alignment tax is the performance penalty incurred when implementing safety constraints in AI systems. Unlike traditional software where security measures have clear overhead metrics (encryption latency, firewall throughput), AI alignment tax manifests across multiple dimensions:

Performance degradation: Reduced task performance from safety-oriented training Latency overhead: Additional inference time from monitoring and verification Compute costs: Redundant models, consensus mechanisms, and safety monitors Capability reduction: Conservative behavior and valid request refusals Development costs: Safety testing, red teaming, and verification infrastructure

The term originated in the AI safety community to describe the organizational and technical costs of making systems safer rather than merely more capable. At the organizational level, capability-safety trade-offs (“alignment tax”) imply that post-training for human preference and safety can lower measured performance, creating incentives to relax alignment under market pressure.

1.2 RLHF Alignment Tax

Reinforcement Learning from Human Feedback (RLHF) is the primary technique for aligning large language models, but it introduces measurable performance costs:

Definition: LLMs acquire a wide range of abilities during pre-training, but aligning LLMs under RLHF can lead to forgetting pretrained abilities, also known as the alignment tax.

Key findings (Lin et al., EMNLP 2024):

Experiments with OpenLLaMA-3B revealed a pronounced alignment tax in NLP tasks
Despite various techniques to mitigate forgetting, they are often at odds with RLHF performance
This creates a fundamental trade-off between alignment performance and forgetting mitigation

Capability changes:

RLHF can bring GPT-3.5-level models to roughly GPT-4 equivalent performance on instruction following
Llama 3 Instruct improves MMLU by almost 2.5 points through post-training
However, RLHF substantially decreases the diversity of outputs sampled for a given input
Even when sampling outputs for different inputs, RLHF produces less diverse text on some metrics

1.3 Safety Tax in Reasoning Models

Recent research (Huang et al., 2025) introduces “Safety Tax” as a distinct phenomenon in Large Reasoning Models (LRMs):

Definition: Safety alignment can restore safety capability in LRMs, but it leads to degradation of reasoning capability. This trade-off between reasoning and safety capability is termed “Safety Tax.”

Mechanism: With the sequential LRM production pipeline (pre-training → reasoning enhancement → safety alignment), safety capability is acquired by “taxing” the reasoning capability.

Implication: Safety Tax presents a critical challenge for future safety research of LRMs, requiring new approaches beyond sequential training pipelines.

1.4 Monitoring and Verification Costs

Runtime monitoring and verification add computational overhead to AI systems:

Inference overhead sources:

Instrumentation overhead: Cost of collecting observations (logging, tracing, sensor reading)
Monitor execution overhead: Computational cost of monitor logic
Intervention overhead: Cost of triggering interventions when violations detected
Communication overhead: Transmitting observations to monitors, monitor results to actuators

Measured overhead (from runtime verification literature):

ROSRV (robotics monitoring): 2-8% CPU overhead, <10MB memory, <1ms latency
Copilot (embedded real-time monitoring): Microsecond latency, <100KB memory
TeSSLa software monitors: 5-15% CPU overhead, <50MB memory
Neural network monitoring: 10-30% inference latency increase for internal representation monitoring
Ensemble uncertainty: 2-5× inference latency (running multiple models)

2. Empirical Measurements

2.1 Factor(T,U): Factored Cognition Trade-offs

The Factor(T,U) protocol (Redwood Research, December 2024) provides the most rigorous empirical data on decomposition-based safety mechanisms:

Safety improvements:

41% → 70% safety rate when adding factored cognition to trusted monitoring
26% → 14% attack success rate reduction
Tested on APPS coding benchmark (competitive programming problems)

How it works:

Tasks decomposed into self-contained child tasks
Each child task solved in isolated context by control protocols
Child solutions reassembled to solve original task
Blue team uses o3-mini as powerful decomposer

Performance cost metric:

Decomposition tax = 1 - (accuracy of Factor(T,U) / accuracy of T)

This metric directly measures the performance cost of adding decomposition to a protocol. The Factor(T,U) research acknowledges: “Attacks on simpler child tasks are more obvious to monitors, but there’s no evidence that honest solutions are more recognizable.”

Critical limitation: Uncertainty exists about how much safety improvement derives from genuine detectability versus “security through obscurity.” The researchers note this as a key open question. (For a deeper treatment of the empirical scheming-reduction evidence, see Empirical Scheming Reduction.)

2.2 Deliberative Alignment Performance

OpenAI and Apollo Research’s collaboration on deliberative alignment (2024) demonstrates dramatic scheming reduction but with performance implications:

Scheming reduction:

o4-mini: 8.7% → 0.3% covert action rate (~29× reduction)
o3: 13% → 0.4% covert action rate (~33× reduction)
Tested across 26 agentic evaluations (180+ environments)

What is deliberative alignment?

Models trained to read and reason about anti-scheming specifications before acting
Models explicitly consider safety spec in chain-of-thought
Training on narrow distribution, testing on diverse out-of-distribution environments

Performance costs:

Additional inference latency from chain-of-thought reasoning about safety specifications
Computational overhead of explicit safety specification reasoning
Rare egregious failures still occur (models occasionally make up new principles or misrepresent spec)
Situational awareness confound: Models demonstrate awareness of being evaluated, which may drive compliance rather than genuine alignment

Critical limitation: “The intervention as-studied is not sufficient for future models, and more work needs to be done.” The reduction may result from evaluation awareness rather than genuine safety improvement.

2.3 RLHF vs Base Models: Capability Differences

Performance patterns (from ICLR 2024-2025 research):

Best-of-N (BoN) outperforms RLHF, which outperforms SFT at largest model sizes
This ordering holds out-of-distribution for different training dataset splits
RLHF generalizes better for difficult distribution shifts in instruction following
For less difficult shifts, RLHF generalizes similarly or slightly worse than SFT
While RLHF performs best out-of-distribution in absolute terms, the advantage varies by difficulty

Diversity costs:

RLHF substantially decreases output diversity compared to other methods
Less diverse text produced even when sampling different inputs
This reveals inherent tension between generalization and diversity

Diminishing returns:

RLHF gives large gains initially from first few thousand feedback examples
Diminishing returns as more examples added
Performance may plateau at level determined by human labeling consistency
Extremely fine-grained distinctions may exceed human ability to label reliably

2.4 Constitutional AI Trade-offs

Anthropic’s Constitutional AI (CAI) demonstrates how self-supervision can reduce costs while maintaining safety:

Efficiency gains:

Cuts need for costly human labeling by up to 70% according to Anthropic’s internal benchmarks
Model generates candidate responses and critiques them against constitutional rules
Enables precise behavior control with far fewer human labels

Constitutional Classifiers (2024):

Input and output classifiers trained on synthetically generated data
Filter overwhelming majority of jailbreaks with minimal over-refusals
0.38% increase in refusal rates (very low over-refusal cost)
~23.7% inference overhead on production traffic (disclosed in arXiv 2501.18837)
Dramatically improved robustness against jailbreaking with minimal additional cost

Prototype version limitations:

Robust to thousands of hours of human red teaming for universal jailbreaks
However, exhibited high overrefusal rates and compute overhead
Updated version achieved similar robustness with dramatically lower costs

2.5 AI Control Protocol Costs

The AI Control framework (Greenblatt et al., 2023-2024) provides empirical data on trusted/untrusted monitoring costs:

Trusted Editing protocol:

GPT-4 writes solution, GPT-3.5 edits suspicious code
92% safety rate when editing 20% of problems
Best safety-usefulness trade-off among tested protocols

Performance implications:

Requires running both untrusted model (GPT-4) and trusted model (GPT-3.5)
Editing overhead on 20% of problems
Limited by capability gap between trusted and untrusted models

Untrusted Monitoring:

GPT-4 writes code, different GPT-4 instance monitors
Anti-collusion techniques prevent coordination
Weaker than trusted editing due to collusion risk
Lower overhead than trusted editing but reduced safety

Audit requirements:

Minimum 1% audit budget empirically tested
Higher audit rates for critical applications
At 2% false positive rate, produces many false negatives
GPT-3.5 qualitatively notices most backdoors but unreliable at distinguishing unnecessary vs necessary-but-complex logic

2.6 Monitoring Overhead in Production Systems

Empirical overhead measurements:

LLM inference benchmarking:

In single concurrency scenarios, monitoring overheads can account for 33% of entire benchmark duration
GenAI-Perf includes storage and response handling overhead in metrics
These overheads become non-negligible for edge applications

Runtime verification overhead:

ROSRV (robotics): 2-8% CPU, <10MB memory, <1ms latency
TeSSLa (general-purpose): 5-15% CPU overhead for software monitors, <1% for FPGA monitors
Neural network OOD detection: 5-20% inference latency increase (density-based methods)
Internal representation monitoring: 10-30% inference latency increase

Monitoring cost reduction techniques:

DIAMOND system: ~30% latency reduction, 38% communication overhead reduction
Adaptive sampling: Reduced overhead but potential to miss violations between samples
Hardware acceleration (FPGA): Near-zero overhead on main system through parallel execution

3. Cost Categories

3.1 Latency Costs

Additional inference calls:

Consensus mechanisms requiring multiple model queries
Deliberative alignment chain-of-thought reasoning overhead
Monitoring and verification checks during inference
Factor(T,U) decomposition and reassembly steps

Illustrative example (hypothetical scenario for illustration):

Median latency: 2.3s (monolithic) → 4.1s (decomposed) = 1.8× overhead
P99 latency: 8.2s (monolithic) → 12.1s (decomposed) = 1.5× overhead

Why overhead is bounded:

Narrow components use smaller, faster models
Parallel execution where possible
Verification is mostly fast code, not neural inference

Context-dependent impact:

Real-time chat: 1.8× may be noticeable to users
Batch processing: Negligible impact on overall throughput
Async workflows: No user-facing impact

3.2 Compute Costs

Redundant models:

Running multiple models for consensus or validation
Trusted and untrusted model pairs in AI Control
Ensemble methods for uncertainty quantification (2-5× inference cost)

Monitors and verifiers:

Runtime verification monitors: 2-15% CPU overhead
OOD detection systems: 5-30% additional inference time
Behavioral anomaly detection: Continuous overhead on all operations

Illustrative cost comparison (hypothetical scenario for illustration):

Component	Model	Cost per 1K calls
Monolithic	GPT-4	$120
Decomposed
- Coordinator	GPT-3.5	$2
- Summarizer	Fine-tuned 7B	$5
- Analyzer	Fine-tuned 7B	$5
- Verifier	Code + small LLM	$3
Decomposed Total		$15

Counterintuitive finding: Decomposition can reduce API costs by 82% when using appropriately-sized models per component. Safety architecture enables using expensive frontier models only where needed, with most components using fine-tuned smaller models.

Total runtime costs (100,000 tasks/month):

Monolithic: $12,200/month
Decomposed with monitoring: $3,000/month
Net savings: $9,200/month despite safety overhead

3.3 Capability Costs

Refusing valid requests (over-refusal):

Constitutional Classifiers: 0.38% increase in refusal rates (2024 updated version)
Earlier prototype: High overrefusal rates (specific figures not disclosed)
RLHF alignment: Reduced diversity in outputs even for valid inputs

Conservative behavior:

Simplex architecture engaging baseline controller reduces performance during safety interventions
More aggressive AC performance → closer to safety boundary → more frequent BC engagement
Design trade-off: How much BC engagement is acceptable?

Capability forgetting:

RLHF alignment tax: Forgetting pretrained abilities on NLP tasks
Safety alignment in LRMs: Reasoning capability degradation
Measured via performance on standard benchmarks (MMLU, etc.)

Risk-adjusted capability:

Factor(T,U): Unknown decomposition tax (accuracy reduction metric defined but specific values not disclosed)
Trusted editing: 92% safety rate when editing 20% of problems implies 8% failure rate
Deliberative alignment: 0.3-0.4% residual scheming rate may be unacceptable for critical applications

3.4 Development Costs

Safety testing:

Comprehensive evaluation across 180+ environments (deliberative alignment study)
Testing on APPS coding benchmark for Factor(T,U)
Multiple benchmark evaluations (MMLU, AIR-Bench, TrustLLM, etc.)

Red teaming:

Thousands of hours of human red teaming (Constitutional Classifiers)
Adversarial testing against monitors to find evasion strategies
Fault injection to verify monitor detection

Engineering overhead (illustrative estimate):

Task	Monolithic Agent	Decomposed System	Delta
Architecture design	1-2 days	3-5 days	+2-3 days
Component implementation	3-5 days	5-10 days	+2-5 days
Verification layers	0 days	2-4 days	+2-4 days
Integration testing	2-3 days	4-6 days	+2-3 days
Total	6-10 days	14-25 days	+8-15 days

Initial implementation cost: $10,000-18,000 at $150/hr engineering rate

Ongoing maintenance:

Monitoring review: +2 hr/week
Component updates: +1 hr/week
Incident investigation: -20% time (easier with isolated components)
Monthly overhead: +12 hr/month = ~$1,800/month

4. When Is the Tax Worth It?

4.1 Risk-Adjusted Return on Safety Investment

The fundamental question is whether safety costs are justified by risk reduction:

Expected value calculation:

EV(Safety) = Risk_Reduction_Value - Safety_Costs
Risk_Reduction_Value = (P_unsafe_baseline - P_unsafe_safe) × Incident_Cost
Safety_Costs = Development_Costs + Ongoing_Costs + Performance_Penalties

Illustrative incident cost estimation (hypothetical scenario for illustration):

Incident Type	Probability (Mono)	Probability (Decomp)	Damage	Expected Cost Mono	Expected Cost Decomp
Data leak	0.5%/mo	0.1%/mo	$50,000	$250	$50
Harmful output (public)	2%/mo	0.3%/mo	$20,000	$400	$60
Service disruption	1%/mo	0.5%/mo	$10,000	$100	$50
Compliance violation	0.3%/mo	0.05%/mo	$100,000	$300	$50
Total expected cost				$1,050/mo	$210/mo

Risk reduction value: $840/month in expected incident cost avoided

Break-even analysis:

Monthly costs: Safety overhead (+$2,000) - API savings (-$9,200) - Risk reduction (-$840) = -$8,040
Net monthly benefit: $8,040 savings
Initial investment: $10,000-18,000
Break-even time: 1.5-2 months for high-volume scenarios

4.2 Threshold Effects and Phase Transitions

Safety performance scaling:

RLHF shows diminishing returns after few thousand feedback examples
Deliberative alignment: 30× scheming reduction but residual 0.3-0.4% rate
Factor(T,U): 41% → 70% safety (approaching but not reaching perfect safety)

Critical thresholds:

Aviation: Backup controller engagement acceptable 0.1% of flight time
Autonomous vehicles: Higher engagement frequency tolerable during initial deployment
Critical infrastructure: Even 0.3% failure rate may be unacceptable

Phase transition phenomena:

Constitutional Classifiers: Prototype had high overrefusal → Updated version achieved dramatic improvement
Suggests potential for step-function improvements in alignment tax reduction
But also indicates substantial R&D investment required to achieve transitions

4.3 Application-Specific Trade-offs

Risk budget allocation guidance:

High-assurance applications (nuclear command, critical infrastructure):

Multiple layers required: Factored cognition + deliberative alignment + trusted monitoring + human oversight
Accept high alignment tax (latency overhead, compute costs)
Even 0.3-0.4% residual risk unacceptable
Initial investment: $50,000-100,000+
Break-even: Worth it regardless of cost if catastrophic risk is prevented

Medium-assurance applications (code review, content moderation):

Factored cognition + trusted monitoring likely sufficient
Monitor for scaling issues as models improve
Moderate alignment tax acceptable (1.5-2× latency, moderate compute overhead)
Initial investment: $10,000-25,000
Break-even: 2-6 months with reasonable volume

Low-assurance applications (creative writing, brainstorming):

Simple monitoring may suffice
Remain vigilant for emergent strategic behavior
Minimal alignment tax acceptable (prefer performance)
Initial investment: $5,000-10,000
Break-even: May not be worth it; use simpler safety measures

Volume-based thresholds (illustrative estimates):

Use Case	Monthly Volume	API Savings	Risk Reduction	Break-Even
Internal tool	1,000 tasks	-$90	-$100	5-8 months
Customer-facing	50,000 tasks	-$4,600	-$500	2-3 months
High-volume SaaS	500,000 tasks	-$46,000	-$800	< 1 month
Autonomous agent	10,000 tasks	-$900	-$2,000	3-4 months

Key finding: For systems with either (a) >10,000 tasks/month or (b) significant incident cost potential, decomposition has positive ROI within 6 months.

5. Reducing the Alignment Tax

5.1 Efficient Safety Mechanisms

Model averaging for RLHF:

Simple interpolation between pre-RLHF and post-RLHF model weights
Achieves strongest alignment-forgetting Pareto front among competing methods
Minimal computational overhead compared to other mitigation techniques

Heterogeneous Model Averaging (HMA) (Lin et al., EMNLP 2024):

Different combination ratios for different transformer layers
Maximizes alignment performance while minimizing alignment tax
Validated on OpenLLaMA-3B and Mistral-7B
Observation: Averaging different layers leads to significantly different alignment-forgetting trade-offs

Direct Preference Optimization (DPO):

Li et al. (2024) finds DPO induces less alignment tax compared to other RLHF algorithms
Avoids need for separate reward model training
More efficient optimization directly on preference data

Constitutional AI efficiency:

Reduces human labeling requirements by 70%
Self-supervision through constitutional principles
Synthetic data generation for classifiers
Enables scaling safety without proportional scaling of human labor costs

5.2 Amortized Verification

Offline verification where possible:

Pre-compute safety properties during development
Cache verification results for common patterns
Use static analysis to eliminate runtime checks where provable

Incremental monitoring:

Update monitor state incrementally rather than recomputing from scratch
Stream processing: Process events one at a time, maintaining minimal state
Memoization: Cache intermediate results to avoid recomputation

Sampling strategies:

Periodic sampling: Monitor every Nth event (reduced overhead, potential to miss violations)
Statistical sampling: Random sampling with statistical guarantees
Adaptive sampling: Increase frequency when violations suspected
Trade-off: Reduced overhead vs. detection probability

Hardware acceleration:

FPGA monitors: <1% CPU overhead (parallel execution)
GPU monitoring: Parallelize across cores for high-throughput systems
Custom ASICs for highest-volume systems
Zero overhead on main system when monitoring offloaded to hardware

5.3 Risk-Based Tiering

Adaptive safety levels:

High-risk operations: Full monitoring + verification + human oversight
Medium-risk: Automated monitoring + periodic audits
Low-risk: Lightweight checks + post-hoc review

Application to deployment:

Real-time chat in low-stakes context: Minimal monitoring
Code generation for critical systems: Full verification pipeline
Content moderation: Risk-based routing (flag high-risk for human review, auto-approve low-risk)

Dynamic adjustment:

Risk budget tracking: Increase monitoring when budget depleted
Confidence-based: Reduce monitoring when confidence in safety is high
Resource-based: Adjust monitoring based on available computational resources

Empirical support:

Factor(T,U): Different decomposition levels for different task complexities
AI Control: 1% audit budget minimum, higher for critical applications
Simplex: Engage baseline controller only when approaching safety boundaries

5.4 Safety-Capability Pareto Frontier

Defining the frontier:

X-axis: Safety performance (attack success rate, scheming rate, incident probability)
Y-axis: Capability performance (accuracy, latency, throughput)
Pareto-optimal points: Cannot improve safety without sacrificing capability or vice versa

Empirical Pareto frontiers:

RLHF alignment-forgetting:

Model averaging achieves best Pareto front
HMA further optimizes by layer-wise averaging
Trade-off is fundamental but position on frontier is improvable

Factor(T,U):

41% safety baseline → 70% with factored cognition (unknown capability cost)
Decomposition tax metric quantifies capability cost
Open question: Is 70% near Pareto frontier or is further improvement possible?

Deliberative alignment:

13% scheming → 0.4% (o3) with additional chain-of-thought reasoning overhead
8.7% scheming → 0.3% (o4-mini)
Rare egregious failures suggest not yet at frontier (room for improvement)

Constitutional Classifiers evolution:

Prototype: High safety, high overrefusal, high compute overhead (suboptimal point)
Updated: High safety, 0.38% overrefusal, moderate compute overhead (closer to frontier)
Demonstrates that research advances can shift the entire frontier outward

Strategic implications:

Early-stage systems: Far from Pareto frontier, improvements possible in both dimensions
Mature systems: Approaching frontier, must choose point based on application requirements
Continuous R&D: Can shift frontier outward over time

6. Measurement Frameworks

6.1 How to Measure Alignment Tax Systematically

Multi-dimensional measurement:

Alignment Tax = {
  Latency_Overhead: (T_safe - T_baseline) / T_baseline,
  Compute_Overhead: (C_safe - C_baseline) / C_baseline,
  Accuracy_Degradation: (A_baseline - A_safe) / A_baseline,
  Overrefusal_Rate: P(refuse | valid_request),
  Development_Cost: Engineering_Hours × Hourly_Rate,
  Maintenance_Cost: Monthly_Hours × Hourly_Rate
}

Baseline establishment:

Measure baseline system performance without safety mechanisms
Establish performance distributions (not just means) to capture variance
Test across representative task distribution, not just average cases
Include edge cases and distribution shifts in baseline measurement

Controlled comparison:

Add safety mechanisms incrementally
Measure each dimension independently
Control for confounding factors (model size, training data, etc.)
Use A/B testing where possible for production systems

Longitudinal tracking:

Monitor alignment tax over time as systems evolve
Track how tax changes with model scaling
Measure impact of safety R&D on reducing tax
Identify phase transitions and threshold effects

6.2 Benchmarks and Metrics

Safety benchmarks:

AIR-Bench 2024 (Stanford):

5,694 tests spanning 314 granular risk categories
Four major domains: System & Operational Risks, Content Safety, Societal Risks, Legal & Rights
Measures percentage of appropriately refused requests
Provides granular safety performance measurement

AILuminate (MLCommons, December 2024):

First third-party LLM trust and safety benchmark
Collaboration between industry experts and AI researchers
Plans to expand to French, Chinese, and Hindi in 2025

TrustLLM:

Six dimensions: Truthfulness, safety, fairness, robustness, privacy, machine ethics
Over 30 datasets across 18+ subcategories
Assesses hallucination, jailbreak resistance, privacy leakage

HarmBench (Center for AI Safety, ICML 2024):

Standardized evaluation framework for automated red teaming
Used by US and UK AI Safety Institutes for pre-deployment testing
Provides consistent adversarial evaluation methodology

Capability benchmarks:

MMLU: Massive Multitask Language Understanding
APPS: Competitive programming problems
Math benchmarks: GSM8K, MATH
Code generation: HumanEval, MBPP

Performance benchmarks:

Inference latency:

MLPerf Inference: Industry-standard, focuses on 90th percentile latency for single-stream
STAC-ML: Financial services inference benchmarks (35.2-640 microseconds for LSTM models on A100)
GenAI-Perf: LLM-specific, includes TPS (tokens per second) with overhead measurement

Monitoring overhead:

CPU overhead: Additional CPU utilization due to monitoring (1-15% typical)
Memory overhead: Additional memory for monitor state (KB to MB typical)
Latency overhead: Additional delay in system response (μs to ms)
Throughput impact: Reduction in system throughput (5-20% for complex monitors)

6.3 Reporting Standards

Comprehensive reporting framework:

1. Safety performance:

Attack success rate before/after safety mechanisms
Scheming/deception rate in relevant evaluations
Overrefusal rate on valid requests
Coverage: What percentage of risks are addressed?

2. Capability performance:

Benchmark scores (MMLU, APPS, etc.) before/after
Task-specific accuracy/quality metrics
Output diversity measurements
Generalization to out-of-distribution inputs

3. Performance overhead:

Median and P99 latency with/without safety
Throughput (requests/second) comparison
Compute costs (API costs, GPU hours, etc.)
Memory and storage requirements

4. Development costs:

Engineering hours for implementation
Testing and validation effort
Red teaming and adversarial testing investment
Ongoing maintenance requirements

5. Confidence and limitations:

Statistical confidence intervals on all measurements
Known failure modes and edge cases
Limitations of evaluation methodology
Assumptions and conditions under which results hold

Standardized format proposal:

## Alignment Tax Report

### System Description
- Base model: [model name and size]
- Safety mechanisms: [list of mechanisms]
- Application domain: [domain]

### Safety Performance
- Attack Success Rate: [baseline]% → [after safety]% ([X]× reduction)
- Scheming Rate: [baseline]% → [after safety]% ([X]× reduction)
- Overrefusal Rate: [rate]% (95% CI: [lower]-[upper]%)
- Evaluation Dataset: [benchmark name], [N] samples

### Capability Performance
- Benchmark: [name], Score: [baseline] → [after safety] ([change]% degradation)
- Output Diversity: [metric], [baseline] → [after safety]
- Task Success Rate: [baseline]% → [after safety]%

### Latency Overhead
- Median: [baseline]s → [safe]s ([X]× overhead)
- P99: [baseline]s → [safe]s ([X]× overhead)

### Compute Costs
- Inference Cost: $[baseline]/1K → $[safe]/1K ([change]%)
- Monthly Cost (at [volume] tasks): $[baseline] → $[safe]

### Development Investment
- Initial: [hours] hours ($[cost])
- Monthly Maintenance: [hours] hours ($[cost])

### Break-Even Analysis
- Monthly Net Impact: $[savings or cost]
- Break-Even Time: [months] months
- ROI at 12 months: [percentage]%

### Limitations
- [List key limitations, assumptions, and caveats]

Best practices:

Report full distributions, not just means
Include statistical significance tests
Provide reproducibility information (datasets, evaluation code)
Disclose conflicts of interest and funding sources
Update reports as systems evolve

Comparative reporting:

Compare against industry baselines where available
Position on safety-capability Pareto frontier
Show trajectory over time (improving tax or stable?)
Benchmark against alternative safety approaches

7. Synthesis and Recommendations

7.1 Key Empirical Findings

Quantified alignment taxes:

Factor(T,U): 41% → 70% safety, 26% → 14% attack success, unknown capability cost
Deliberative alignment: ~30× scheming reduction, chain-of-thought latency overhead, 0.3-0.4% residual risk
RLHF: Substantial capability forgetting in NLP tasks, reduced output diversity, diminishing returns
Constitutional AI: 70% reduction in human labeling, 0.38% overrefusal increase, moderate compute overhead
Decomposed architectures: 1.5-2× latency, but 50-80% API cost savings through model right-sizing
Runtime monitoring: 2-30% performance overhead depending on mechanism

Non-obvious insights:

Safety mechanisms can reduce total costs (decomposition enables cheaper models per component)
Alignment tax is often reducible through research (Constitutional Classifiers evolution)
Different safety mechanisms have different tax structures (compute vs. capability vs. latency)
Taxes are application-dependent (high-volume benefits from decomposition, low-volume may not)

7.2 Decision Framework

When to accept high alignment tax:

Catastrophic risk potential (nuclear, critical infrastructure, autonomous weapons)
Regulatory requirements mandate safety regardless of cost
Reputational risk from incidents exceeds performance benefits
Long-term deployment where upfront investment amortizes

When to minimize alignment tax:

Competitive markets where performance is differentiator
Research and development phases (iterate fast, add safety later)
Low-stakes applications (creative tools, entertainment)
Resource-constrained settings (embedded systems, edge deployment)

Optimization strategies:

Risk-based tiering: High tax for critical operations, low tax for routine
Amortized verification: Invest in infrastructure that reduces ongoing costs
Pareto frontier research: Invest in R&D to improve both safety and capability
Right-sized mechanisms: Match safety investment to actual risk level
Continuous measurement: Track tax over time and optimize

7.3 Open Research Questions

Measurement challenges:

How to measure rare but catastrophic failures (long-tail risk)?
How to distinguish genuine safety from “security through obscurity”?
How to account for situational awareness confounds in evaluations?
How to measure alignment tax for emergent capabilities?

Optimization opportunities:

Can we achieve Factor(T,U)‘s 70% safety with lower decomposition tax?
What is the theoretical limit of the safety-capability Pareto frontier?
How to reduce RLHF alignment tax while maintaining safety benefits?
Can hardware acceleration eliminate monitoring overhead entirely?

Scaling questions:

How does alignment tax scale with model capability?
Will trusted models remain available as untrusted models improve?
Do alignment taxes decrease or increase with model size?
What happens to alignment tax at superhuman performance levels?

Economic and organizational:

How to create market incentives for accepting alignment tax?
What regulatory frameworks support alignment tax investment?
How to measure and communicate alignment tax to stakeholders?
What are competitive dynamics when some organizations accept tax and others don’t?

7.4 Practical Guidance

For practitioners:

Measure your baseline: Establish performance metrics before adding safety
Start simple: Implement lightweight safety, measure tax, then add complexity if justified
Monitor continuously: Track alignment tax over time as systems evolve
Optimize for your context: High-volume systems benefit from different optimizations than low-volume
Report transparently: Document safety-performance trade-offs for stakeholders

For researchers:

Report full tax profiles: Not just safety improvements, but all cost dimensions
Seek Pareto improvements: Research that improves both safety and capability
Study reduction techniques: How to achieve same safety with lower tax
Develop better benchmarks: More comprehensive measurement frameworks
Consider adversarial settings: How does tax change under adaptive adversaries?

For policymakers:

Recognize tax legitimacy: Alignment tax is real cost, not inefficiency
Create incentives: Reward organizations that accept tax for safety
Require transparency: Mandate reporting of safety-performance trade-offs
Support research: Fund work on reducing alignment tax
Set standards: Establish acceptable tax levels for different risk categories

7.5 Conclusion

Alignment tax is a fundamental feature of AI safety, not a bug to be eliminated. The empirical evidence demonstrates that meaningful safety improvements require sacrificing some combination of performance, latency, cost, or capability. However, the magnitude of this tax is not fixed—research advances have repeatedly demonstrated that better mechanisms can achieve comparable safety with substantially lower tax.

The key findings:

Alignment tax is measurable: Ranging from 0.38% overrefusal (Constitutional Classifiers) to 2× latency (decomposed architectures) to substantial capability forgetting (RLHF)
Tax varies by mechanism: Different safety approaches impose different cost structures. Choose mechanisms matching your constraints (latency-critical vs. cost-critical vs. capability-critical)
Tax is often worth it: For most production systems, the risk reduction value exceeds the tax cost. Break-even typically occurs within 2-6 months.
Tax is reducible: Research advances (HMA, DPO, Constitutional Classifiers v2) demonstrate that alignment tax can decrease over time while maintaining or improving safety.
Context matters: The same alignment tax may be negligible for one application and prohibitive for another. Systematic measurement and application-specific evaluation are essential.

As AI systems become more capable and deployed in higher-stakes contexts, understanding and managing alignment tax becomes increasingly critical. Organizations must measure these costs systematically, optimize them where possible, and make informed decisions about when to accept them in exchange for safety. The future of AI safety depends not on eliminating alignment tax, but on ensuring it remains acceptable relative to the risks being mitigated.

Key Citations

Coherent Risk Measures

Artzner, Delbaen, Eber, Heath (1999) - “Coherent Measures of Risk” - Mathematical Finance 9(3)
Delbaen (2002) - Coherent risk measures on general probability spaces
Föllmer & Schied (2002, 2016) - Stochastic Finance

Expected Shortfall

Acerbi & Tasche (2002) - “Expected Shortfall: A Natural Coherent Alternative to Value at Risk”
Basel III FRTB - Shift from VaR to ES

Euler Allocation

Tasche (2007) - “Capital Allocation to Business Units: the Euler Principle” - arXiv:0708.2542
McNeil, Frey, Embrechts (2005) - Quantitative Risk Management

Spectral Measures

Acerbi (2002) - “Spectral Measures of Risk”
Kusuoka (2001) - Representation theorem

Markov Categories

Fritz (2020) - “A Synthetic Approach to Markov Kernels” - Advances in Mathematics 370
Cho & Jacobs (2019) - Disintegration and Bayesian inversion

Linear Logic

Girard (1987) - “Linear Logic” - Theoretical Computer Science 50
Girard, Scedrov, Scott (1992) - Bounded Linear Logic

Object Capabilities

Miller (2006) - Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control
Dennis & Van Horn (1966) - Original capability concept

Compositional Verification

Benveniste et al. (2018) - Contracts for System Design
Pacti tool - Incer et al. (2022)

Insurance and Actuarial (Complexity Pricing)

Bühlmann & Gisler (2005) - A Course in Credibility Theory
Klugman, Panjer, Willmot (2012) - Loss Models
Daykin, Pentikäinen, Pesonen (1994) - Practical Risk Theory for Actuaries

Robust Optimization

Ben-Tal, El Ghaoui, Nemirovski (2009) - Robust Optimization
Bertsimas & Brown (2009) - “Constructing Uncertainty Sets for Robust Linear Optimization”

Software Complexity

McCabe (1976) - “A Complexity Measure” - IEEE Transactions on Software Engineering
Halstead (1977) - Elements of Software Science

Organizational Complexity

Perrow (1984) - Normal Accidents
Reason (1997) - Managing the Risks of Organizational Accidents
Leveson (2011) - Engineering a Safer World

Ambiguity and Model Uncertainty

Ellsberg (1961) - “Risk, Ambiguity, and the Savage Axioms”
Gilboa & Schmeidler (1989) - “Maxmin Expected Utility”
Hansen & Sargent (2001) - “Robust Control and Model Uncertainty”

Sources (Alignment Tax)