Skip to content

Delegation Risk: Overview

A mature delegation risk framework could provide mathematical and operational foundations for managing AI systems more safely at scale.

A complete delegation risk framework would provide:

flowchart TB
    subgraph "Delegation Risk Components"
        Q[1. Quantification<br/>Harm surface, Exposure, Delegation Risk]
        C[2. Composition<br/>inheritance rules]
        O[3. Optimization<br/>minimize risk]
        D[4. Dynamics<br/>evolution over time]
        P[5. Protocols<br/>handshakes, revocation]
        T[6. Tools<br/>simulation, monitoring]
        S[7. Standards<br/>industry/regulatory]
    end
    Q --> C --> O --> D
    P --> T --> S

1. Quantification: Every delegation has a harm surface (set of harm modes), an exposure (worst-case bound), and a delegation risk (expected cost)

2. Composition: Rules for combining risks through delegation chains — multiplicative by default, with a correlation correction when stages share failure causes (see Risk Propagation)

3. Optimization: Algorithms for minimizing delegation risk given constraints

4. Dynamics: Models for how trust and risk evolve over time

5. Protocols: Standard procedures for delegation handshakes, revocation, etc.

6. Tools: Software for risk analysis, simulation, monitoring

7. Standards: Industry/regulatory standards for risk levels and verification

1. AI systems are becoming more capable: Higher capabilities = larger harm surface and greater delegation risk.

2. AI systems are becoming more autonomous: Less human oversight = risk management must be structural.

3. AI systems are being deployed in high-stakes domains: Healthcare, finance, infrastructure = harm mode realization is catastrophic.

4. AI systems are becoming more interconnected: Agent-to-agent delegation = risk inheritance matters.

5. We’re building systems we don’t fully understand: Unknown capabilities = unknown harm modes.

Harm Surface is the complete set of possible harms (harm modes) from delegating a task. It’s not a single number—it’s a collection, like an attack surface or failure envelope.

Delegation Risk = Σ P(harm mode) × Damage(harm mode)

For each component, sum over all harm modes: probability times damage. This gives a single number representing “what is the expected cost of delegating to this component?”

Worked Example: Research Assistant Delegation Risk

Section titled “Worked Example: Research Assistant Delegation Risk”

Consider a decomposed research assistant with three components:

flowchart LR
    H[Human Principal] -->|"trust: 0.95"| C[Coordinator]
    C -->|"trust: 0.90"| S[Summarizer]
    C -->|"trust: 0.85"| D[Code Deployer]

Summarizer Harm Surface:

Harm ModeProbabilityDamageRisk Contribution
Misrepresents paper findings0.02$5,000 (wrong research direction)$100
Leaks proprietary data0.001$50,000 (IP loss)$50

Delegation Risk (Summarizer) = $100 + $50 = $150

Code Deployer Harm Surface:

Harm ModeProbabilityDamageRisk Contribution
Deploys buggy code0.05$20,000 (downtime)$1,000
Deploys malicious code0.0001$1,000,000 (breach)$100

Delegation Risk (Code Deployer) = $1,000 + $100 = $1,100

Coordinator Harm Surface:

Harm ModeProbabilityDamageRisk Contribution
Misroutes task0.01$2,000 (wasted effort)$20
Grants excessive permissions0.005$100,000 (escalation)$500

Delegation Risk (Coordinator) = $20 + $500 = $520

System Total Delegation Risk: $1,770

What risk does the Human inherit from the Code Deployer through the delegation chain?

Using the multiplicative rule (the framework’s canonical default — see Risk Propagation for the derivation and the correlation correction), the chain’s joint reliability is the product of the per-link trust values — the probability that both links behave faithfully end-to-end:

JointReliability(Human → Deployer) = Trust(H→C) × Trust(C→D)
= 0.95 × 0.85
= 0.8075 (~81%)

So only ~81% of the time does the whole chain hold. The residual ~19% is the probability that at least one link fails somewhere between the Human and the Deployer — and that failure channel is how the Human inherits the Deployer’s harm surface. (This stays at the probability level: it bounds how often the chain breaks, not what fraction of dollar damage flows through.)

Suppose the organization’s total acceptable Delegation Risk Budget is $2,000/month.

Current allocation:

  • Summarizer: $150 (7.5% of budget)
  • Code Deployer: $1,100 (55% of budget) ⚠️
  • Coordinator: $520 (26% of budget)

Finding: Code Deployer consumes most of the risk budget. Options:

  1. Add verification layer before deployment (reduce harm mode probability)
  2. Reduce deployer’s permissions (reduce harm mode damage)
  3. Replace with human-in-the-loop (shift delegation risk to human)

Add a deployment verifier that reviews code before it ships (deterministic checks plus a formally-verified policy engine):

flowchart LR
    H[Human] -->|"0.95"| C[Coordinator]
    C -->|"0.85"| D[Deployer]
    D -->|"0.99"| V[Verifier]
    V --> Prod[Production]

We assume the verifier catches 99% of malicious deployments (its policy engine blocks unauthorized changes) and 70% of buggy deployments (static analysis and tests catch most, but not all, defects). Only the Code Deployer’s rows change; the Summarizer ($150) and Coordinator ($520) are untouched.

Harm ModeOld RiskCatch RateNew Risk Contribution
Deploys buggy code$1,00070%0.30 × $1,000 = $300
Deploys malicious code$10099%0.01 × $100 = $1

New Delegation Risk (Code Deployer) = $300 + $1 = $301 (down from $1,100).

New System Total = $150 (Summarizer) + $520 (Coordinator) + $301 (Code Deployer) = $971.

Nuclear plants achieve ~10⁻⁵ core damage frequency per reactor-year (NRC subsidiary goal: ~10⁻⁴/reactor-year). Using the achieved figure for a $10B damage potential:

  • Example Delegation Risk = 10⁻⁵ × $10B = $100,000/year

Our research assistant’s $971/month ≈ $12,000/year is roughly comparable—which suggests either:

  1. We’re being appropriately cautious, or
  2. Nuclear plants manage much higher absolute stakes with similar relative risk

This kind of cross-domain comparison helps calibrate whether AI safety investments are proportionate.


To dive deeper into specific topics:

  • Risk Inheritance — Algorithms for computing how risk flows through delegation networks
  • Risk Dynamics — How trust evolves, decays, and rebuilds over time

To see delegation risk applied:

To understand the foundations:

To start implementing:


  • Artzner, P., et al. (1999). Coherent Measures of Risk. Mathematical Finance. — Axiomatic foundations
  • Tasche, D. (2008). Capital Allocation to Business Units: the Euler Principle. arXiv — Risk decomposition
  • Fritz, T. (2020). A Synthetic Approach to Markov Kernels. arXiv — Compositional probability

See the full bibliography for comprehensive references.