Skip to content

Risk Measurement & Pricing

This page brings together three strands of quantitative work that share a common goal: turning the delegation-risk framework’s qualitative apparatus into something you can measure, compose, and price.

  1. Compositional risk measures — the mathematical foundations (coherent risk measures, Euler allocation, Markov categories, linear logic, spectral measures) for risk measures that compose properly across AI system components.
  2. Complexity pricing — how to price the epistemic uncertainty that comes from structural complexity rather than from the underlying risk itself.
  3. Alignment tax quantification — empirical measurement of the performance costs of AI safety mechanisms, and frameworks for deciding when those costs are justified.

Mathematical foundations for risk measures that compose properly across AI system components.

Coherent Risk Measures (Artzner et al. 1999)

Section titled “Coherent Risk Measures (Artzner et al. 1999)”

A risk measure ρ is coherent if it satisfies:

AxiomFormulaMeaning
MonotonicityX ≤ Y a.s. ⟹ ρ(X) ≤ ρ(Y)Lower losses = lower risk
Translation Invarianceρ(X + α) = ρ(X) − αAdding cash reduces risk by that amount
Positive Homogeneityρ(λX) = λρ(X) for λ ≥ 0Double position = double risk
Subadditivityρ(X + Y) ≤ ρ(X) + ρ(Y)Diversification doesn’t increase risk

Subadditivity is the most critical axiom - it captures “a merger does not create extra risk.”

VaR can violate subadditivity with heavy-tailed distributions:

Bond portfolio example:

  • Concentrated portfolio (100 units of 1 bond): VaR₀.₉₅ = −500 (a gain!)
  • Diversified portfolio (1 unit each of 100 bonds): VaR₀.₉₅ = 25

VaR says diversified portfolio is riskier - contradicts financial intuition.

General result: VaR is subadditive for normal distributions but fails for heavy tails.

Expected Shortfall (ES/CVaR): Average loss in the worst (1-α)% of cases.

ES_α(X) = E[X | X ≥ VaR_α(X)]

ES satisfies all four axioms. It overcomes VaR’s limitations by:

  • Providing information about losses beyond VaR threshold
  • Averaging tail losses rather than focusing on single quantile
  • Encouraging diversification through subadditivity

Regulatory adoption: Basel III shifted from VaR to ES at 97.5%.

Any coherent risk measure can be written as:

ρ(X) = sup_{Q ∈ P} E_Q[−X]

where P is a set of probability measures (the “risk envelope”).

Interpretation: Coherent risk = worst-case expected loss across multiple probability models.


A function f is homogeneous of degree k if:

f(cx) = c^k · f(x) for any c > 0

For degree 1: f(cx) = c·f(x) (linear scaling).

For homogeneous functions of degree k:

k·f(x) = Σᵢ xᵢ·(∂f/∂xᵢ)

For degree 1 risk measures:

RM(x) = Σᵢ xᵢ·(∂RM/∂xᵢ)

The risk contribution of component i is:

RC_i = x_i · (∂RM/∂x_i)

Key property: Σᵢ RC_i = RM (full allocation)

This means component risk contributions sum exactly to total risk - no gaps, no waste.

PropertyRisk Measure Property
Positive homogeneityFull allocation
Subadditivity”No undercut”
Translation invarianceRiskless allocation

Critical result: Full allocation and RORAC compatibility hold simultaneously if and only if risk measure is homogeneous of degree 1.

  • Portfolio standard deviation σ_p(x)
  • Value-at-Risk VaR
  • Expected Shortfall ES
  • All spectral risk measures
  • All coherent risk measures

TypeFormulaWhen It Applies
AdditiveR_total = Σ R_iIndependent linear risks
MultiplicativeR_total = Π R_iSeries reliability
Sub-additiveR_total < Σ R_iDiversification benefit
Super-additiveR_total > Σ R_iCommon-cause failures

OR Gate: Output fails if ANY input fails

  • P(failure) ≈ Σp_i for small probabilities
  • Models “parallel in failure space”

AND Gate: Output fails only if ALL inputs fail

  • P(failure) = Πp_i
  • Models redundancy success

Contract structure: (Assumptions, Guarantees)

  • Assumptions: Properties environment must satisfy
  • Guarantees: Properties component delivers under assumptions

Compositional verification: Verify component contracts individually, compose to prove system properties.

Recent tools: Pacti for efficient contract computation, AGREE for architectural verification.


Markov categories provide categorical foundations for compositional probability.

Core idea: A symmetric monoidal category where morphisms behave like “random functions.”

MorphismInterpretation
p : 1 → XProbability distribution on X
k : X → YMarkov kernel / channel
Δ_X : X → X ⊗ XCopy information
!_X : X → IDelete information

Sequential: f : X → Y and g : Y → Z compose to g ∘ f : X → Z

Parallel: f : X → Y and g : W → Z compose to f ⊗ g : X ⊗ W → Y ⊗ Z

Markov categories provide:

  1. Rigorous foundations for composing probabilistic AI components
  2. Formal treatment of conditional independence
  3. Comparison of statistical experiments
  4. Foundation for probabilistic contracts between components

Key distinction: Disallows contraction and weakening for unmarked formulas.

Resource interpretation:

  • Proposition = resource
  • Proof = process consuming resources
  • Each assumption used exactly once

Values of linear type must be used exactly once:

  • No implicit copying (contraction forbidden)
  • No implicit deletion (weakening forbidden)

Applications: Memory management, file handles, cryptographic keys, protocol states.

Extension: Instead of “use exactly once,” allow “use at most k times.”

QBAL (Quantified Bounded Affine Logic): Quantification over resource variables, preserving polynomial time soundness.

Capability: Unforgeable token granting authority to perform operations.

Security: Objects interact only via messages on references. Security relies on not being able to forge references.

Languages: E, Joe-E, Pony, Cadence, Austral.

Combining linear types with capabilities:

type RiskBudget[R: Real] = LinearCapability[R]
fn risky_operation(budget: RiskBudget[0.1]) -> Result

Type system ensures:

  • Budget can’t be duplicated (linear)
  • Total risk ≤ sum of allocated budgets (conservation)
  • Risk-taking operations require explicit budget allocation

Spectral risk measure: Weighted average of loss quantiles.

M_φ(X) = ∫₀¹ φ(p) · q(p) dp

where φ(p) is the risk aversion function and q(p) is the quantile function.

φ must satisfy:

  1. Positivity: φ(p) ≥ 0
  2. Normalization: ∫₀¹ φ(p) dp = 1
  3. Increasingness: φ non-decreasing (φ’(p) ≥ 0)

Condition 3 reflects risk aversion - weight higher losses at least as much as lower losses.

ES_α is a spectral measure with:

φ(p) = 1/(1-α) if p ≥ α, else 0

Uniform weight on worst (1-α)% of outcomes.

General result: Any spectral measure = positively weighted average of ES at different levels.

Any law-invariant coherent risk measure can be written as:

ρ(X) = sup_{μ ∈ M} ∫₀¹ ES_α(X) dμ(α)


Challenges:

  • Measurement difficulties for AI catastrophic risk
  • Collective risk: multiple “safe” models may collectively exceed thresholds
  • Discontinuous behavior near critical parameters

Promising directions:

  1. Harm as loss distribution: Define harm H, use ρ(H) for coherent measure
  2. Spectral measures: Heavily weight catastrophic tail events
  3. ES for AI: ES_α(Harm) = average harm in worst α% scenarios

Critical requirement: Subadditivity must reflect reality. If combining AI systems creates emergent super-additive risks, coherent measures need modification.

Systematic risk (common cause):

  • All AI systems fail for same reason
  • Shared training data, common vulnerabilities
  • Does not diversify away

Random failures (idiosyncratic):

  • Independent across systems
  • Diversification helps
  • Subadditivity applies

Handling approaches:

  1. Separate risk components (systematic + idiosyncratic)
  2. Use convex measures for systematic (relax subadditivity)
  3. Worst-case scenario sets including correlated failures
  4. Fault tree analysis with explicit common-cause modeling

Helps when:

  • Failures are independent
  • Risk measure is subadditive
  • Normal or light-tailed distributions
  • No common-cause vulnerabilities

Hurts when:

  • Systematic risk dominates
  • Heavy-tailed distributions
  • Risk monoculture (all systems use same approach)
  • Increased attack surface from complexity
  • Series reliability (all must succeed)

AI-specific:

  • Helps: Diverse training data, multiple architectures, ensemble methods
  • Hurts: Oligopolistic AI vendors create illusory diversification, sophisticated adversaries exploit weakest component

The compositional machinery above assumes you can decompose a system’s risk into components whose contributions sum cleanly. Part II turns to what happens when that assumption is shaky — when structural complexity makes the decomposition itself uncertain — and how to price that uncertainty.


How do you price uncertainty that comes from structural complexity rather than from the underlying risk itself?

Standard risk pricing assumes you can estimate probability distributions for losses. But complex organizational structures create epistemic uncertainty—uncertainty about your uncertainty.

Risk TypeWhat You KnowHow to Price
Simple riskDistribution of outcomesExpected loss + margin
Parameter uncertaintyDistribution family, uncertain parametersBayesian methods, wider intervals
Structural complexityUnknown interactions, hidden dependencies???

Complexity doesn’t change the expected value of losses—it changes how confident you can be in that estimate.


Traditional insurance uses loading factors to account for uncertainty:

Premium = Expected Loss × (1 + Loading Factor)

Source of LoadingTypical Factor
Administrative costs10-20%
Profit margin5-15%
Parameter uncertainty10-30%
Model uncertainty20-50%
Structural complexity50-300%+

The problem: loading factors are often set by judgment rather than systematic analysis.

Credibility theory balances individual experience against class statistics:

Credibility-weighted estimate = Z × (individual estimate) + (1-Z) × (class estimate)

where Z (credibility factor) depends on:

  • Volume of individual data
  • Variance in individual experience
  • Variance in class experience

Application to complexity: High-complexity organizations have low credibility (Z → 0) because their structure makes historical data less predictive. They’re priced closer to worst-case class estimates.

Robust optimization handles uncertainty sets rather than point estimates:

Minimize worst-case cost over all scenarios in uncertainty set U

For complexity pricing:

  • Larger uncertainty sets for complex structures
  • Premium reflects worst case within the set
  • Complexity score determines set size

Key insight: You’re not pricing expected loss—you’re pricing the worst plausible loss given structural uncertainty.

Software Engineering: Cyclomatic Complexity

Section titled “Software Engineering: Cyclomatic Complexity”

McCabe (1976) defined cyclomatic complexity as the number of linearly independent paths through code:

V(G) = E - N + 2P

where E = edges, N = nodes, P = connected components.

Higher complexity correlates with:

  • More defects
  • Harder to test
  • Higher maintenance costs

Analogy for organizations: Delegation structures have “paths” through authority chains. More paths = harder to predict behavior.


FactorContributionRationale
Delegation depth+2 per layerEach layer adds interpretation variance
Parallel delegates+1 per delegateCoordination uncertainty
Shared resources+2 per shared resourceContention and priority conflicts
Informal relationships+3 per relationshipUndocumented authority creates surprises
Cross-layer dependencies+2 per dependencyBypasses normal authority flow
Ambiguous boundaries+3 per boundaryUnclear who’s responsible
External dependencies+2 per dependencyLess control, less visibility

Static structure isn’t everything. Some complexity emerges from dynamics:

FactorContributionRationale
High turnover+2Institutional knowledge loss
Rapid growth+3Structure lags reality
Recent reorganization+2Transition period uncertainty
Multi-jurisdictional+2 per jurisdictionDifferent rules, enforcement
FactorContributionRationale
Poor documentation+3Can’t verify claims about structure
Audit findings+1 per findingEvidence of hidden issues
Information asymmetry+2Principal can’t observe agent
Opacity of decision-making+3Can’t attribute outcomes to causes

From Complexity Score to Uncertainty Multiplier

Section titled “From Complexity Score to Uncertainty Multiplier”

If we had data on organizational failures vs. complexity scores, we could fit:

Uncertainty Multiplier = f(Complexity Score)

Proposed functional form (requires empirical validation):

σ_multiplier = e^(0.15 × complexity_score)

Complexity ScoreUncertainty Multiplier
21.35× (±35%)
52.1× (±110%)
104.5× (±350%)
159.5× (±850%)
2020× (±1900%)

Why exponential? Each complexity factor potentially:

  1. Introduces new failure modes (additive)
  2. Creates interactions with existing factors (multiplicative)

If interactions dominate, we expect multiplicative (exponential) growth in uncertainty.

Given uncertainty multiplier σ_m:

Premium = Expected Loss × σ_m × Safety Factor

The safety factor depends on insurer risk tolerance and capital requirements.


Price at the upper end of the confidence interval:

Premium = E[Loss] × (1 + k × σ_multiplier)

where k reflects how many standard deviations to cover (typically 1-3).

Advantage: Simple, conservative. Disadvantage: May be too expensive for high complexity.

Load premium proportional to variance:

Premium = E[Loss] + λ × Var[Loss]

For complexity: Var[Loss] = Base_Var × σ_multiplier²

Advantage: Standard actuarial practice. Disadvantage: Requires variance estimate.

Use Expected Shortfall (CVaR) at a high confidence level:

Premium = ES_α(Loss) where α depends on complexity

Complexity Scoreα LevelInterpretation
Low (< 5)95%Standard tail risk
Medium (5-10)99%More conservative
High (10-15)99.9%Near worst-case
Very High (> 15)99.99%Extreme caution

Advantage: Coherent risk measure, tail-focused. Disadvantage: Requires distribution assumptions.

(See Part I for why Expected Shortfall is a coherent risk measure and how it relates to spectral measures.)

Use robust optimization over uncertainty sets:

Premium = max_{θ ∈ Θ(complexity)} E_θ[Loss]

where Θ(complexity) is the set of plausible models given complexity level.

Advantage: No distribution assumptions. Disadvantage: Defining Θ is challenging.


Structured questionnaire covering:

  • Organization chart analysis
  • Process documentation review
  • Interview key personnel
  • Audit report review
  • External dependency mapping

Use calibrated function (initially judgment-based, refined with data):

if complexity_score < 5:
uncertainty = "low"
multiplier = 1.5
elif complexity_score < 10:
uncertainty = "medium"
multiplier = 3.0
elif complexity_score < 15:
uncertainty = "high"
multiplier = 7.0
else:
uncertainty = "very high"
multiplier = 15.0+

Premium = Base_Premium × multiplier

where Base_Premium is the rate for a simple, well-understood structure.

Step 4: Offer Complexity Reduction Incentives

Section titled “Step 4: Offer Complexity Reduction Incentives”

Show client how premium changes with structural changes:

ChangeComplexity ΔPremium Δ
Document informal relationships-3-15%
Eliminate shared resources-2-10%
Add clear authority boundaries-3-15%
Reduce to 2 layers-2-10%

Organizational failures linked to complexity are rare and idiosyncratic. Building a training dataset is difficult.

We observe organizations that haven’t catastrophically failed. High-complexity survivors may have unobserved mitigating factors.

Complex organizations may differ from simple ones in ways that affect risk independently of complexity.

  1. Historical case studies: Analyze past organizational failures for complexity factors
  2. Simulation: Agent-based models of delegation chains under stress
  3. Expert elicitation: Structured surveys of risk professionals
  4. Regulatory data: Insurance claim data by organizational characteristics
  5. Near-miss analysis: Study incidents that almost became failures

Complexity pricing extends the delegation accounting framework by quantifying the uncertainty term:

Net Delegation Value = Receivable - (Exposure × σ_multiplier) - Costs

The complexity multiplier captures how much to discount the exposure estimate based on structural uncertainty.

Pattern interconnection creates complexity when defense patterns share failure modes. Complexity pricing provides a method to quantify this “correlation tax.”

The compositional risk measures in Part I assume you can decompose system risk. Complexity represents the degree to which this decomposition fails—interactions that can’t be captured by analyzing components separately.


  1. Optimal complexity scoring: Which factors matter most? How should they be weighted?

  2. Functional form: Is the uncertainty multiplier truly exponential? Could it be polynomial, logistic, or stepped?

  3. Industry variation: Do complexity factors have different impacts in different domains?

  4. Dynamic complexity: How do you price complexity that changes over time?

  5. Complexity reduction ROI: What’s the most cost-effective way to reduce complexity for insurance purposes?

  6. AI-specific factors: What additional complexity factors apply to AI systems (emergent behavior, capability uncertainty, alignment uncertainty)?


Measuring the performance costs of AI safety mechanisms.

Parts I and II priced the risk of delegation and the uncertainty surrounding that risk. But the safety mechanisms that reduce risk are not free: they impose their own performance, compute, and capability costs. Part III turns to measuring that cost directly — the “alignment tax” — which is what determines whether a given mitigation is worth deploying.

The “alignment tax” represents the performance degradation, computational overhead, and capability limitations imposed by safety mechanisms in AI systems. As AI safety becomes increasingly critical for deployment, understanding and quantifying these costs is essential for making informed trade-offs between safety and performance. This part examines empirical measurements, cost categories, and frameworks for systematic alignment tax quantification.

The alignment tax is the performance penalty incurred when implementing safety constraints in AI systems. Unlike traditional software where security measures have clear overhead metrics (encryption latency, firewall throughput), AI alignment tax manifests across multiple dimensions:

Performance degradation: Reduced task performance from safety-oriented training Latency overhead: Additional inference time from monitoring and verification Compute costs: Redundant models, consensus mechanisms, and safety monitors Capability reduction: Conservative behavior and valid request refusals Development costs: Safety testing, red teaming, and verification infrastructure

The term originated in the AI safety community to describe the organizational and technical costs of making systems safer rather than merely more capable. At the organizational level, capability-safety trade-offs (“alignment tax”) imply that post-training for human preference and safety can lower measured performance, creating incentives to relax alignment under market pressure.

Reinforcement Learning from Human Feedback (RLHF) is the primary technique for aligning large language models, but it introduces measurable performance costs:

Definition: LLMs acquire a wide range of abilities during pre-training, but aligning LLMs under RLHF can lead to forgetting pretrained abilities, also known as the alignment tax.

Key findings (Lin et al., EMNLP 2024):

  • Experiments with OpenLLaMA-3B revealed a pronounced alignment tax in NLP tasks
  • Despite various techniques to mitigate forgetting, they are often at odds with RLHF performance
  • This creates a fundamental trade-off between alignment performance and forgetting mitigation

Capability changes:

  • RLHF can bring GPT-3.5-level models to roughly GPT-4 equivalent performance on instruction following
  • Llama 3 Instruct improves MMLU by almost 2.5 points through post-training
  • However, RLHF substantially decreases the diversity of outputs sampled for a given input
  • Even when sampling outputs for different inputs, RLHF produces less diverse text on some metrics

Recent research (Huang et al., 2025) introduces “Safety Tax” as a distinct phenomenon in Large Reasoning Models (LRMs):

Definition: Safety alignment can restore safety capability in LRMs, but it leads to degradation of reasoning capability. This trade-off between reasoning and safety capability is termed “Safety Tax.”

Mechanism: With the sequential LRM production pipeline (pre-training → reasoning enhancement → safety alignment), safety capability is acquired by “taxing” the reasoning capability.

Implication: Safety Tax presents a critical challenge for future safety research of LRMs, requiring new approaches beyond sequential training pipelines.

Runtime monitoring and verification add computational overhead to AI systems:

Inference overhead sources:

  • Instrumentation overhead: Cost of collecting observations (logging, tracing, sensor reading)
  • Monitor execution overhead: Computational cost of monitor logic
  • Intervention overhead: Cost of triggering interventions when violations detected
  • Communication overhead: Transmitting observations to monitors, monitor results to actuators

Measured overhead (from runtime verification literature):

  • ROSRV (robotics monitoring): 2-8% CPU overhead, <10MB memory, <1ms latency
  • Copilot (embedded real-time monitoring): Microsecond latency, <100KB memory
  • TeSSLa software monitors: 5-15% CPU overhead, <50MB memory
  • Neural network monitoring: 10-30% inference latency increase for internal representation monitoring
  • Ensemble uncertainty: 2-5× inference latency (running multiple models)

2.1 Factor(T,U): Factored Cognition Trade-offs

Section titled “2.1 Factor(T,U): Factored Cognition Trade-offs”

The Factor(T,U) protocol (Redwood Research, December 2024) provides the most rigorous empirical data on decomposition-based safety mechanisms:

Safety improvements:

  • 41% → 70% safety rate when adding factored cognition to trusted monitoring
  • 26% → 14% attack success rate reduction
  • Tested on APPS coding benchmark (competitive programming problems)

How it works:

  1. Tasks decomposed into self-contained child tasks
  2. Each child task solved in isolated context by control protocols
  3. Child solutions reassembled to solve original task
  4. Blue team uses o3-mini as powerful decomposer

Performance cost metric:

Decomposition tax = 1 - (accuracy of Factor(T,U) / accuracy of T)

This metric directly measures the performance cost of adding decomposition to a protocol. The Factor(T,U) research acknowledges: “Attacks on simpler child tasks are more obvious to monitors, but there’s no evidence that honest solutions are more recognizable.”

Critical limitation: Uncertainty exists about how much safety improvement derives from genuine detectability versus “security through obscurity.” The researchers note this as a key open question. (For a deeper treatment of the empirical scheming-reduction evidence, see Empirical Scheming Reduction.)

OpenAI and Apollo Research’s collaboration on deliberative alignment (2024) demonstrates dramatic scheming reduction but with performance implications:

Scheming reduction:

  • o4-mini: 8.7% → 0.3% covert action rate (~29× reduction)
  • o3: 13% → 0.4% covert action rate (~33× reduction)
  • Tested across 26 agentic evaluations (180+ environments)

What is deliberative alignment?

  • Models trained to read and reason about anti-scheming specifications before acting
  • Models explicitly consider safety spec in chain-of-thought
  • Training on narrow distribution, testing on diverse out-of-distribution environments

Performance costs:

  • Additional inference latency from chain-of-thought reasoning about safety specifications
  • Computational overhead of explicit safety specification reasoning
  • Rare egregious failures still occur (models occasionally make up new principles or misrepresent spec)
  • Situational awareness confound: Models demonstrate awareness of being evaluated, which may drive compliance rather than genuine alignment

Critical limitation: “The intervention as-studied is not sufficient for future models, and more work needs to be done.” The reduction may result from evaluation awareness rather than genuine safety improvement.

2.3 RLHF vs Base Models: Capability Differences

Section titled “2.3 RLHF vs Base Models: Capability Differences”

Performance patterns (from ICLR 2024-2025 research):

  • Best-of-N (BoN) outperforms RLHF, which outperforms SFT at largest model sizes
  • This ordering holds out-of-distribution for different training dataset splits
  • RLHF generalizes better for difficult distribution shifts in instruction following
  • For less difficult shifts, RLHF generalizes similarly or slightly worse than SFT
  • While RLHF performs best out-of-distribution in absolute terms, the advantage varies by difficulty

Diversity costs:

  • RLHF substantially decreases output diversity compared to other methods
  • Less diverse text produced even when sampling different inputs
  • This reveals inherent tension between generalization and diversity

Diminishing returns:

  • RLHF gives large gains initially from first few thousand feedback examples
  • Diminishing returns as more examples added
  • Performance may plateau at level determined by human labeling consistency
  • Extremely fine-grained distinctions may exceed human ability to label reliably

Anthropic’s Constitutional AI (CAI) demonstrates how self-supervision can reduce costs while maintaining safety:

Efficiency gains:

  • Cuts need for costly human labeling by up to 70% according to Anthropic’s internal benchmarks
  • Model generates candidate responses and critiques them against constitutional rules
  • Enables precise behavior control with far fewer human labels

Constitutional Classifiers (2024):

  • Input and output classifiers trained on synthetically generated data
  • Filter overwhelming majority of jailbreaks with minimal over-refusals
  • 0.38% increase in refusal rates (very low over-refusal cost)
  • ~23.7% inference overhead on production traffic (disclosed in arXiv 2501.18837)
  • Dramatically improved robustness against jailbreaking with minimal additional cost

Prototype version limitations:

  • Robust to thousands of hours of human red teaming for universal jailbreaks
  • However, exhibited high overrefusal rates and compute overhead
  • Updated version achieved similar robustness with dramatically lower costs

The AI Control framework (Greenblatt et al., 2023-2024) provides empirical data on trusted/untrusted monitoring costs:

Trusted Editing protocol:

  • GPT-4 writes solution, GPT-3.5 edits suspicious code
  • 92% safety rate when editing 20% of problems
  • Best safety-usefulness trade-off among tested protocols

Performance implications:

  • Requires running both untrusted model (GPT-4) and trusted model (GPT-3.5)
  • Editing overhead on 20% of problems
  • Limited by capability gap between trusted and untrusted models

Untrusted Monitoring:

  • GPT-4 writes code, different GPT-4 instance monitors
  • Anti-collusion techniques prevent coordination
  • Weaker than trusted editing due to collusion risk
  • Lower overhead than trusted editing but reduced safety

Audit requirements:

  • Minimum 1% audit budget empirically tested
  • Higher audit rates for critical applications
  • At 2% false positive rate, produces many false negatives
  • GPT-3.5 qualitatively notices most backdoors but unreliable at distinguishing unnecessary vs necessary-but-complex logic

2.6 Monitoring Overhead in Production Systems

Section titled “2.6 Monitoring Overhead in Production Systems”

Empirical overhead measurements:

LLM inference benchmarking:

  • In single concurrency scenarios, monitoring overheads can account for 33% of entire benchmark duration
  • GenAI-Perf includes storage and response handling overhead in metrics
  • These overheads become non-negligible for edge applications

Runtime verification overhead:

  • ROSRV (robotics): 2-8% CPU, <10MB memory, <1ms latency
  • TeSSLa (general-purpose): 5-15% CPU overhead for software monitors, <1% for FPGA monitors
  • Neural network OOD detection: 5-20% inference latency increase (density-based methods)
  • Internal representation monitoring: 10-30% inference latency increase

Monitoring cost reduction techniques:

  • DIAMOND system: ~30% latency reduction, 38% communication overhead reduction
  • Adaptive sampling: Reduced overhead but potential to miss violations between samples
  • Hardware acceleration (FPGA): Near-zero overhead on main system through parallel execution

Additional inference calls:

  • Consensus mechanisms requiring multiple model queries
  • Deliberative alignment chain-of-thought reasoning overhead
  • Monitoring and verification checks during inference
  • Factor(T,U) decomposition and reassembly steps

Illustrative example (hypothetical scenario for illustration):

  • Median latency: 2.3s (monolithic) → 4.1s (decomposed) = 1.8× overhead
  • P99 latency: 8.2s (monolithic) → 12.1s (decomposed) = 1.5× overhead

Why overhead is bounded:

  • Narrow components use smaller, faster models
  • Parallel execution where possible
  • Verification is mostly fast code, not neural inference

Context-dependent impact:

  • Real-time chat: 1.8× may be noticeable to users
  • Batch processing: Negligible impact on overall throughput
  • Async workflows: No user-facing impact

Redundant models:

  • Running multiple models for consensus or validation
  • Trusted and untrusted model pairs in AI Control
  • Ensemble methods for uncertainty quantification (2-5× inference cost)

Monitors and verifiers:

  • Runtime verification monitors: 2-15% CPU overhead
  • OOD detection systems: 5-30% additional inference time
  • Behavioral anomaly detection: Continuous overhead on all operations

Illustrative cost comparison (hypothetical scenario for illustration):

ComponentModelCost per 1K calls
MonolithicGPT-4$120
Decomposed
- CoordinatorGPT-3.5$2
- SummarizerFine-tuned 7B$5
- AnalyzerFine-tuned 7B$5
- VerifierCode + small LLM$3
Decomposed Total$15

Counterintuitive finding: Decomposition can reduce API costs by 82% when using appropriately-sized models per component. Safety architecture enables using expensive frontier models only where needed, with most components using fine-tuned smaller models.

Total runtime costs (100,000 tasks/month):

  • Monolithic: $12,200/month
  • Decomposed with monitoring: $3,000/month
  • Net savings: $9,200/month despite safety overhead

Refusing valid requests (over-refusal):

  • Constitutional Classifiers: 0.38% increase in refusal rates (2024 updated version)
  • Earlier prototype: High overrefusal rates (specific figures not disclosed)
  • RLHF alignment: Reduced diversity in outputs even for valid inputs

Conservative behavior:

  • Simplex architecture engaging baseline controller reduces performance during safety interventions
  • More aggressive AC performance → closer to safety boundary → more frequent BC engagement
  • Design trade-off: How much BC engagement is acceptable?

Capability forgetting:

  • RLHF alignment tax: Forgetting pretrained abilities on NLP tasks
  • Safety alignment in LRMs: Reasoning capability degradation
  • Measured via performance on standard benchmarks (MMLU, etc.)

Risk-adjusted capability:

  • Factor(T,U): Unknown decomposition tax (accuracy reduction metric defined but specific values not disclosed)
  • Trusted editing: 92% safety rate when editing 20% of problems implies 8% failure rate
  • Deliberative alignment: 0.3-0.4% residual scheming rate may be unacceptable for critical applications

Safety testing:

  • Comprehensive evaluation across 180+ environments (deliberative alignment study)
  • Testing on APPS coding benchmark for Factor(T,U)
  • Multiple benchmark evaluations (MMLU, AIR-Bench, TrustLLM, etc.)

Red teaming:

  • Thousands of hours of human red teaming (Constitutional Classifiers)
  • Adversarial testing against monitors to find evasion strategies
  • Fault injection to verify monitor detection

Engineering overhead (illustrative estimate):

TaskMonolithic AgentDecomposed SystemDelta
Architecture design1-2 days3-5 days+2-3 days
Component implementation3-5 days5-10 days+2-5 days
Verification layers0 days2-4 days+2-4 days
Integration testing2-3 days4-6 days+2-3 days
Total6-10 days14-25 days+8-15 days

Initial implementation cost: $10,000-18,000 at $150/hr engineering rate

Ongoing maintenance:

  • Monitoring review: +2 hr/week
  • Component updates: +1 hr/week
  • Incident investigation: -20% time (easier with isolated components)
  • Monthly overhead: +12 hr/month = ~$1,800/month

4.1 Risk-Adjusted Return on Safety Investment

Section titled “4.1 Risk-Adjusted Return on Safety Investment”

The fundamental question is whether safety costs are justified by risk reduction:

Expected value calculation:

EV(Safety) = Risk_Reduction_Value - Safety_Costs
Risk_Reduction_Value = (P_unsafe_baseline - P_unsafe_safe) × Incident_Cost
Safety_Costs = Development_Costs + Ongoing_Costs + Performance_Penalties

Illustrative incident cost estimation (hypothetical scenario for illustration):

Incident TypeProbability (Mono)Probability (Decomp)DamageExpected Cost MonoExpected Cost Decomp
Data leak0.5%/mo0.1%/mo$50,000$250$50
Harmful output (public)2%/mo0.3%/mo$20,000$400$60
Service disruption1%/mo0.5%/mo$10,000$100$50
Compliance violation0.3%/mo0.05%/mo$100,000$300$50
Total expected cost$1,050/mo$210/mo

Risk reduction value: $840/month in expected incident cost avoided

Break-even analysis:

  • Monthly costs: Safety overhead (+$2,000) - API savings (-$9,200) - Risk reduction (-$840) = -$8,040
  • Net monthly benefit: $8,040 savings
  • Initial investment: $10,000-18,000
  • Break-even time: 1.5-2 months for high-volume scenarios

4.2 Threshold Effects and Phase Transitions

Section titled “4.2 Threshold Effects and Phase Transitions”

Safety performance scaling:

  • RLHF shows diminishing returns after few thousand feedback examples
  • Deliberative alignment: 30× scheming reduction but residual 0.3-0.4% rate
  • Factor(T,U): 41% → 70% safety (approaching but not reaching perfect safety)

Critical thresholds:

  • Aviation: Backup controller engagement acceptable 0.1% of flight time
  • Autonomous vehicles: Higher engagement frequency tolerable during initial deployment
  • Critical infrastructure: Even 0.3% failure rate may be unacceptable

Phase transition phenomena:

  • Constitutional Classifiers: Prototype had high overrefusal → Updated version achieved dramatic improvement
  • Suggests potential for step-function improvements in alignment tax reduction
  • But also indicates substantial R&D investment required to achieve transitions

Risk budget allocation guidance:

High-assurance applications (nuclear command, critical infrastructure):

  • Multiple layers required: Factored cognition + deliberative alignment + trusted monitoring + human oversight
  • Accept high alignment tax (latency overhead, compute costs)
  • Even 0.3-0.4% residual risk unacceptable
  • Initial investment: $50,000-100,000+
  • Break-even: Worth it regardless of cost if catastrophic risk is prevented

Medium-assurance applications (code review, content moderation):

  • Factored cognition + trusted monitoring likely sufficient
  • Monitor for scaling issues as models improve
  • Moderate alignment tax acceptable (1.5-2× latency, moderate compute overhead)
  • Initial investment: $10,000-25,000
  • Break-even: 2-6 months with reasonable volume

Low-assurance applications (creative writing, brainstorming):

  • Simple monitoring may suffice
  • Remain vigilant for emergent strategic behavior
  • Minimal alignment tax acceptable (prefer performance)
  • Initial investment: $5,000-10,000
  • Break-even: May not be worth it; use simpler safety measures

Volume-based thresholds (illustrative estimates):

Use CaseMonthly VolumeAPI SavingsRisk ReductionBreak-Even
Internal tool1,000 tasks-$90-$1005-8 months
Customer-facing50,000 tasks-$4,600-$5002-3 months
High-volume SaaS500,000 tasks-$46,000-$800< 1 month
Autonomous agent10,000 tasks-$900-$2,0003-4 months

Key finding: For systems with either (a) >10,000 tasks/month or (b) significant incident cost potential, decomposition has positive ROI within 6 months.

Model averaging for RLHF:

  • Simple interpolation between pre-RLHF and post-RLHF model weights
  • Achieves strongest alignment-forgetting Pareto front among competing methods
  • Minimal computational overhead compared to other mitigation techniques

Heterogeneous Model Averaging (HMA) (Lin et al., EMNLP 2024):

  • Different combination ratios for different transformer layers
  • Maximizes alignment performance while minimizing alignment tax
  • Validated on OpenLLaMA-3B and Mistral-7B
  • Observation: Averaging different layers leads to significantly different alignment-forgetting trade-offs

Direct Preference Optimization (DPO):

  • Li et al. (2024) finds DPO induces less alignment tax compared to other RLHF algorithms
  • Avoids need for separate reward model training
  • More efficient optimization directly on preference data

Constitutional AI efficiency:

  • Reduces human labeling requirements by 70%
  • Self-supervision through constitutional principles
  • Synthetic data generation for classifiers
  • Enables scaling safety without proportional scaling of human labor costs

Offline verification where possible:

  • Pre-compute safety properties during development
  • Cache verification results for common patterns
  • Use static analysis to eliminate runtime checks where provable

Incremental monitoring:

  • Update monitor state incrementally rather than recomputing from scratch
  • Stream processing: Process events one at a time, maintaining minimal state
  • Memoization: Cache intermediate results to avoid recomputation

Sampling strategies:

  • Periodic sampling: Monitor every Nth event (reduced overhead, potential to miss violations)
  • Statistical sampling: Random sampling with statistical guarantees
  • Adaptive sampling: Increase frequency when violations suspected
  • Trade-off: Reduced overhead vs. detection probability

Hardware acceleration:

  • FPGA monitors: <1% CPU overhead (parallel execution)
  • GPU monitoring: Parallelize across cores for high-throughput systems
  • Custom ASICs for highest-volume systems
  • Zero overhead on main system when monitoring offloaded to hardware

Adaptive safety levels:

High-risk operations: Full monitoring + verification + human oversight
Medium-risk: Automated monitoring + periodic audits
Low-risk: Lightweight checks + post-hoc review

Application to deployment:

  • Real-time chat in low-stakes context: Minimal monitoring
  • Code generation for critical systems: Full verification pipeline
  • Content moderation: Risk-based routing (flag high-risk for human review, auto-approve low-risk)

Dynamic adjustment:

  • Risk budget tracking: Increase monitoring when budget depleted
  • Confidence-based: Reduce monitoring when confidence in safety is high
  • Resource-based: Adjust monitoring based on available computational resources

Empirical support:

  • Factor(T,U): Different decomposition levels for different task complexities
  • AI Control: 1% audit budget minimum, higher for critical applications
  • Simplex: Engage baseline controller only when approaching safety boundaries

Defining the frontier:

  • X-axis: Safety performance (attack success rate, scheming rate, incident probability)
  • Y-axis: Capability performance (accuracy, latency, throughput)
  • Pareto-optimal points: Cannot improve safety without sacrificing capability or vice versa

Empirical Pareto frontiers:

RLHF alignment-forgetting:

  • Model averaging achieves best Pareto front
  • HMA further optimizes by layer-wise averaging
  • Trade-off is fundamental but position on frontier is improvable

Factor(T,U):

  • 41% safety baseline → 70% with factored cognition (unknown capability cost)
  • Decomposition tax metric quantifies capability cost
  • Open question: Is 70% near Pareto frontier or is further improvement possible?

Deliberative alignment:

  • 13% scheming → 0.4% (o3) with additional chain-of-thought reasoning overhead
  • 8.7% scheming → 0.3% (o4-mini)
  • Rare egregious failures suggest not yet at frontier (room for improvement)

Constitutional Classifiers evolution:

  • Prototype: High safety, high overrefusal, high compute overhead (suboptimal point)
  • Updated: High safety, 0.38% overrefusal, moderate compute overhead (closer to frontier)
  • Demonstrates that research advances can shift the entire frontier outward

Strategic implications:

  • Early-stage systems: Far from Pareto frontier, improvements possible in both dimensions
  • Mature systems: Approaching frontier, must choose point based on application requirements
  • Continuous R&D: Can shift frontier outward over time

6.1 How to Measure Alignment Tax Systematically

Section titled “6.1 How to Measure Alignment Tax Systematically”

Multi-dimensional measurement:

Alignment Tax = {
Latency_Overhead: (T_safe - T_baseline) / T_baseline,
Compute_Overhead: (C_safe - C_baseline) / C_baseline,
Accuracy_Degradation: (A_baseline - A_safe) / A_baseline,
Overrefusal_Rate: P(refuse | valid_request),
Development_Cost: Engineering_Hours × Hourly_Rate,
Maintenance_Cost: Monthly_Hours × Hourly_Rate
}

Baseline establishment:

  • Measure baseline system performance without safety mechanisms
  • Establish performance distributions (not just means) to capture variance
  • Test across representative task distribution, not just average cases
  • Include edge cases and distribution shifts in baseline measurement

Controlled comparison:

  • Add safety mechanisms incrementally
  • Measure each dimension independently
  • Control for confounding factors (model size, training data, etc.)
  • Use A/B testing where possible for production systems

Longitudinal tracking:

  • Monitor alignment tax over time as systems evolve
  • Track how tax changes with model scaling
  • Measure impact of safety R&D on reducing tax
  • Identify phase transitions and threshold effects

Safety benchmarks:

AIR-Bench 2024 (Stanford):

  • 5,694 tests spanning 314 granular risk categories
  • Four major domains: System & Operational Risks, Content Safety, Societal Risks, Legal & Rights
  • Measures percentage of appropriately refused requests
  • Provides granular safety performance measurement

AILuminate (MLCommons, December 2024):

  • First third-party LLM trust and safety benchmark
  • Collaboration between industry experts and AI researchers
  • Plans to expand to French, Chinese, and Hindi in 2025

TrustLLM:

  • Six dimensions: Truthfulness, safety, fairness, robustness, privacy, machine ethics
  • Over 30 datasets across 18+ subcategories
  • Assesses hallucination, jailbreak resistance, privacy leakage

HarmBench (Center for AI Safety, ICML 2024):

  • Standardized evaluation framework for automated red teaming
  • Used by US and UK AI Safety Institutes for pre-deployment testing
  • Provides consistent adversarial evaluation methodology

Capability benchmarks:

  • MMLU: Massive Multitask Language Understanding
  • APPS: Competitive programming problems
  • Math benchmarks: GSM8K, MATH
  • Code generation: HumanEval, MBPP

Performance benchmarks:

Inference latency:

  • MLPerf Inference: Industry-standard, focuses on 90th percentile latency for single-stream
  • STAC-ML: Financial services inference benchmarks (35.2-640 microseconds for LSTM models on A100)
  • GenAI-Perf: LLM-specific, includes TPS (tokens per second) with overhead measurement

Monitoring overhead:

  • CPU overhead: Additional CPU utilization due to monitoring (1-15% typical)
  • Memory overhead: Additional memory for monitor state (KB to MB typical)
  • Latency overhead: Additional delay in system response (μs to ms)
  • Throughput impact: Reduction in system throughput (5-20% for complex monitors)

Comprehensive reporting framework:

1. Safety performance:

  • Attack success rate before/after safety mechanisms
  • Scheming/deception rate in relevant evaluations
  • Overrefusal rate on valid requests
  • Coverage: What percentage of risks are addressed?

2. Capability performance:

  • Benchmark scores (MMLU, APPS, etc.) before/after
  • Task-specific accuracy/quality metrics
  • Output diversity measurements
  • Generalization to out-of-distribution inputs

3. Performance overhead:

  • Median and P99 latency with/without safety
  • Throughput (requests/second) comparison
  • Compute costs (API costs, GPU hours, etc.)
  • Memory and storage requirements

4. Development costs:

  • Engineering hours for implementation
  • Testing and validation effort
  • Red teaming and adversarial testing investment
  • Ongoing maintenance requirements

5. Confidence and limitations:

  • Statistical confidence intervals on all measurements
  • Known failure modes and edge cases
  • Limitations of evaluation methodology
  • Assumptions and conditions under which results hold

Standardized format proposal:

## Alignment Tax Report
### System Description
- Base model: [model name and size]
- Safety mechanisms: [list of mechanisms]
- Application domain: [domain]
### Safety Performance
- Attack Success Rate: [baseline]% → [after safety]% ([X]× reduction)
- Scheming Rate: [baseline]% → [after safety]% ([X]× reduction)
- Overrefusal Rate: [rate]% (95% CI: [lower]-[upper]%)
- Evaluation Dataset: [benchmark name], [N] samples
### Capability Performance
- Benchmark: [name], Score: [baseline] → [after safety] ([change]% degradation)
- Output Diversity: [metric], [baseline] → [after safety]
- Task Success Rate: [baseline]% → [after safety]%
### Latency Overhead
- Median: [baseline]s → [safe]s ([X]× overhead)
- P99: [baseline]s → [safe]s ([X]× overhead)
### Compute Costs
- Inference Cost: $[baseline]/1K → $[safe]/1K ([change]%)
- Monthly Cost (at [volume] tasks): $[baseline] → $[safe]
### Development Investment
- Initial: [hours] hours ($[cost])
- Monthly Maintenance: [hours] hours ($[cost])
### Break-Even Analysis
- Monthly Net Impact: $[savings or cost]
- Break-Even Time: [months] months
- ROI at 12 months: [percentage]%
### Limitations
- [List key limitations, assumptions, and caveats]

Best practices:

  • Report full distributions, not just means
  • Include statistical significance tests
  • Provide reproducibility information (datasets, evaluation code)
  • Disclose conflicts of interest and funding sources
  • Update reports as systems evolve

Comparative reporting:

  • Compare against industry baselines where available
  • Position on safety-capability Pareto frontier
  • Show trajectory over time (improving tax or stable?)
  • Benchmark against alternative safety approaches

Quantified alignment taxes:

  1. Factor(T,U): 41% → 70% safety, 26% → 14% attack success, unknown capability cost
  2. Deliberative alignment: ~30× scheming reduction, chain-of-thought latency overhead, 0.3-0.4% residual risk
  3. RLHF: Substantial capability forgetting in NLP tasks, reduced output diversity, diminishing returns
  4. Constitutional AI: 70% reduction in human labeling, 0.38% overrefusal increase, moderate compute overhead
  5. Decomposed architectures: 1.5-2× latency, but 50-80% API cost savings through model right-sizing
  6. Runtime monitoring: 2-30% performance overhead depending on mechanism

Non-obvious insights:

  • Safety mechanisms can reduce total costs (decomposition enables cheaper models per component)
  • Alignment tax is often reducible through research (Constitutional Classifiers evolution)
  • Different safety mechanisms have different tax structures (compute vs. capability vs. latency)
  • Taxes are application-dependent (high-volume benefits from decomposition, low-volume may not)

When to accept high alignment tax:

  • Catastrophic risk potential (nuclear, critical infrastructure, autonomous weapons)
  • Regulatory requirements mandate safety regardless of cost
  • Reputational risk from incidents exceeds performance benefits
  • Long-term deployment where upfront investment amortizes

When to minimize alignment tax:

  • Competitive markets where performance is differentiator
  • Research and development phases (iterate fast, add safety later)
  • Low-stakes applications (creative tools, entertainment)
  • Resource-constrained settings (embedded systems, edge deployment)

Optimization strategies:

  1. Risk-based tiering: High tax for critical operations, low tax for routine
  2. Amortized verification: Invest in infrastructure that reduces ongoing costs
  3. Pareto frontier research: Invest in R&D to improve both safety and capability
  4. Right-sized mechanisms: Match safety investment to actual risk level
  5. Continuous measurement: Track tax over time and optimize

Measurement challenges:

  • How to measure rare but catastrophic failures (long-tail risk)?
  • How to distinguish genuine safety from “security through obscurity”?
  • How to account for situational awareness confounds in evaluations?
  • How to measure alignment tax for emergent capabilities?

Optimization opportunities:

  • Can we achieve Factor(T,U)‘s 70% safety with lower decomposition tax?
  • What is the theoretical limit of the safety-capability Pareto frontier?
  • How to reduce RLHF alignment tax while maintaining safety benefits?
  • Can hardware acceleration eliminate monitoring overhead entirely?

Scaling questions:

  • How does alignment tax scale with model capability?
  • Will trusted models remain available as untrusted models improve?
  • Do alignment taxes decrease or increase with model size?
  • What happens to alignment tax at superhuman performance levels?

Economic and organizational:

  • How to create market incentives for accepting alignment tax?
  • What regulatory frameworks support alignment tax investment?
  • How to measure and communicate alignment tax to stakeholders?
  • What are competitive dynamics when some organizations accept tax and others don’t?

For practitioners:

  1. Measure your baseline: Establish performance metrics before adding safety
  2. Start simple: Implement lightweight safety, measure tax, then add complexity if justified
  3. Monitor continuously: Track alignment tax over time as systems evolve
  4. Optimize for your context: High-volume systems benefit from different optimizations than low-volume
  5. Report transparently: Document safety-performance trade-offs for stakeholders

For researchers:

  1. Report full tax profiles: Not just safety improvements, but all cost dimensions
  2. Seek Pareto improvements: Research that improves both safety and capability
  3. Study reduction techniques: How to achieve same safety with lower tax
  4. Develop better benchmarks: More comprehensive measurement frameworks
  5. Consider adversarial settings: How does tax change under adaptive adversaries?

For policymakers:

  1. Recognize tax legitimacy: Alignment tax is real cost, not inefficiency
  2. Create incentives: Reward organizations that accept tax for safety
  3. Require transparency: Mandate reporting of safety-performance trade-offs
  4. Support research: Fund work on reducing alignment tax
  5. Set standards: Establish acceptable tax levels for different risk categories

Alignment tax is a fundamental feature of AI safety, not a bug to be eliminated. The empirical evidence demonstrates that meaningful safety improvements require sacrificing some combination of performance, latency, cost, or capability. However, the magnitude of this tax is not fixed—research advances have repeatedly demonstrated that better mechanisms can achieve comparable safety with substantially lower tax.

The key findings:

  1. Alignment tax is measurable: Ranging from 0.38% overrefusal (Constitutional Classifiers) to 2× latency (decomposed architectures) to substantial capability forgetting (RLHF)

  2. Tax varies by mechanism: Different safety approaches impose different cost structures. Choose mechanisms matching your constraints (latency-critical vs. cost-critical vs. capability-critical)

  3. Tax is often worth it: For most production systems, the risk reduction value exceeds the tax cost. Break-even typically occurs within 2-6 months.

  4. Tax is reducible: Research advances (HMA, DPO, Constitutional Classifiers v2) demonstrate that alignment tax can decrease over time while maintaining or improving safety.

  5. Context matters: The same alignment tax may be negligible for one application and prohibitive for another. Systematic measurement and application-specific evaluation are essential.

As AI systems become more capable and deployed in higher-stakes contexts, understanding and managing alignment tax becomes increasingly critical. Organizations must measure these costs systematically, optimize them where possible, and make informed decisions about when to accept them in exchange for safety. The future of AI safety depends not on eliminating alignment tax, but on ensuring it remains acceptable relative to the risks being mitigated.


  • Artzner, Delbaen, Eber, Heath (1999) - “Coherent Measures of Risk” - Mathematical Finance 9(3)
  • Delbaen (2002) - Coherent risk measures on general probability spaces
  • Föllmer & Schied (2002, 2016) - Stochastic Finance
  • Acerbi & Tasche (2002) - “Expected Shortfall: A Natural Coherent Alternative to Value at Risk”
  • Basel III FRTB - Shift from VaR to ES
  • Tasche (2007) - “Capital Allocation to Business Units: the Euler Principle” - arXiv:0708.2542
  • McNeil, Frey, Embrechts (2005) - Quantitative Risk Management
  • Acerbi (2002) - “Spectral Measures of Risk”
  • Kusuoka (2001) - Representation theorem
  • Fritz (2020) - “A Synthetic Approach to Markov Kernels” - Advances in Mathematics 370
  • Cho & Jacobs (2019) - Disintegration and Bayesian inversion
  • Girard (1987) - “Linear Logic” - Theoretical Computer Science 50
  • Girard, Scedrov, Scott (1992) - Bounded Linear Logic
  • Miller (2006) - Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control
  • Dennis & Van Horn (1966) - Original capability concept
  • Benveniste et al. (2018) - Contracts for System Design
  • Pacti tool - Incer et al. (2022)

Insurance and Actuarial (Complexity Pricing)

Section titled “Insurance and Actuarial (Complexity Pricing)”
  • Bühlmann & Gisler (2005) - A Course in Credibility Theory
  • Klugman, Panjer, Willmot (2012) - Loss Models
  • Daykin, Pentikäinen, Pesonen (1994) - Practical Risk Theory for Actuaries
  • Ben-Tal, El Ghaoui, Nemirovski (2009) - Robust Optimization
  • Bertsimas & Brown (2009) - “Constructing Uncertainty Sets for Robust Linear Optimization”
  • McCabe (1976) - “A Complexity Measure” - IEEE Transactions on Software Engineering
  • Halstead (1977) - Elements of Software Science
  • Perrow (1984) - Normal Accidents
  • Reason (1997) - Managing the Risks of Organizational Accidents
  • Leveson (2011) - Engineering a Safer World
  • Ellsberg (1961) - “Risk, Ambiguity, and the Savage Axioms”
  • Gilboa & Schmeidler (1989) - “Maxmin Expected Utility”
  • Hansen & Sargent (2001) - “Robust Control and Model Uncertainty”

Deliberative Alignment and Scheming Detection

Section titled “Deliberative Alignment and Scheming Detection”