Delegation Risk: Overview

A mature delegation risk framework could provide mathematical and operational foundations for managing AI systems more safely at scale.

Vision

A complete delegation risk framework would provide:

flowchart TB
    subgraph "Delegation Risk Components"
        Q[1. Quantification<br/>Harm surface, Exposure, Delegation Risk]
        C[2. Composition<br/>inheritance rules]
        O[3. Optimization<br/>minimize risk]
        D[4. Dynamics<br/>evolution over time]
        P[5. Protocols<br/>handshakes, revocation]
        T[6. Tools<br/>simulation, monitoring]
        S[7. Standards<br/>industry/regulatory]
    end
    Q --> C --> O --> D
    P --> T --> S

1. Quantification: Every delegation has a harm surface (set of harm modes), an exposure (worst-case bound), and a delegation risk (expected cost)

2. Composition: Rules for combining risks through delegation chains — multiplicative by default, with a correlation correction when stages share failure causes (see Risk Propagation)

3. Optimization: Algorithms for minimizing delegation risk given constraints

4. Dynamics: Models for how trust and risk evolve over time

5. Protocols: Standard procedures for delegation handshakes, revocation, etc.

6. Tools: Software for risk analysis, simulation, monitoring

7. Standards: Industry/regulatory standards for risk levels and verification

Why Delegation Risk Matters

1. AI systems are becoming more capable: Higher capabilities = larger harm surface and greater delegation risk.

2. AI systems are becoming more autonomous: Less human oversight = risk management must be structural.

3. AI systems are being deployed in high-stakes domains: Healthcare, finance, infrastructure = harm mode realization is catastrophic.

4. AI systems are becoming more interconnected: Agent-to-agent delegation = risk inheritance matters.

5. We’re building systems we don’t fully understand: Unknown capabilities = unknown harm modes.

Core Concepts

Harm Surface

Harm Surface is the complete set of possible harms (harm modes) from delegating a task. It’s not a single number—it’s a collection, like an attack surface or failure envelope.

Delegation Risk

Delegation Risk = Σ P(harm mode) × Damage(harm mode)

For each component, sum over all harm modes: probability times damage. This gives a single number representing “what is the expected cost of delegating to this component?”

Worked Example: Research Assistant Delegation Risk

Consider a decomposed research assistant with three components:

flowchart LR
    H[Human Principal] -->|"trust: 0.95"| C[Coordinator]
    C -->|"trust: 0.90"| S[Summarizer]
    C -->|"trust: 0.85"| D[Code Deployer]

Step 1: Identify Harm Modes and Damages

Summarizer Harm Surface:

Harm Mode	Probability	Damage	Risk Contribution
Misrepresents paper findings	0.02	$5,000 (wrong research direction)	$100
Leaks proprietary data	0.001	$50,000 (IP loss)	$50

Delegation Risk (Summarizer) = $100 + $50 = $150

Code Deployer Harm Surface:

Harm Mode	Probability	Damage	Risk Contribution
Deploys buggy code	0.05	$20,000 (downtime)	$1,000
Deploys malicious code	0.0001	$1,000,000 (breach)	$100

Delegation Risk (Code Deployer) = $1,000 + $100 = $1,100

Coordinator Harm Surface:

Harm Mode	Probability	Damage	Risk Contribution
Misroutes task	0.01	$2,000 (wasted effort)	$20
Grants excessive permissions	0.005	$100,000 (escalation)	$500

Delegation Risk (Coordinator) = $20 + $500 = $520

System Total Delegation Risk: $1,770

Step 2: Risk Inheritance

What risk does the Human inherit from the Code Deployer through the delegation chain?

Using the multiplicative rule (the framework’s canonical default — see Risk Propagation for the derivation and the correlation correction), the chain’s joint reliability is the product of the per-link trust values — the probability that both links behave faithfully end-to-end:

JointReliability(Human → Deployer) = Trust(H→C) × Trust(C→D)
                                   = 0.95 × 0.85
                                   = 0.8075  (~81%)

So only ~81% of the time does the whole chain hold. The residual ~19% is the probability that at least one link fails somewhere between the Human and the Deployer — and that failure channel is how the Human inherits the Deployer’s harm surface. (This stays at the probability level: it bounds how often the chain breaks, not what fraction of dollar damage flows through.)

Step 3: Risk Budget Allocation

Suppose the organization’s total acceptable Delegation Risk Budget is $2,000/month.

Current allocation:

Summarizer: $150 (7.5% of budget)
Code Deployer: $1,100 (55% of budget) ⚠️
Coordinator: $520 (26% of budget)

Finding: Code Deployer consumes most of the risk budget. Options:

Add verification layer before deployment (reduce harm mode probability)
Reduce deployer’s permissions (reduce harm mode damage)
Replace with human-in-the-loop (shift delegation risk to human)

Step 4: After Mitigation

Add a deployment verifier that reviews code before it ships (deterministic checks plus a formally-verified policy engine):

flowchart LR
    H[Human] -->|"0.95"| C[Coordinator]
    C -->|"0.85"| D[Deployer]
    D -->|"0.99"| V[Verifier]
    V --> Prod[Production]

We assume the verifier catches 99% of malicious deployments (its policy engine blocks unauthorized changes) and 70% of buggy deployments (static analysis and tests catch most, but not all, defects). Only the Code Deployer’s rows change; the Summarizer ($150) and Coordinator ($520) are untouched.

Harm Mode	Old Risk	Catch Rate	New Risk Contribution
Deploys buggy code	$1,000	70%	0.30 × $1,000 = $300
Deploys malicious code	$100	99%	0.01 × $100 = $1

New Delegation Risk (Code Deployer) = $300 + $1 = $301 (down from $1,100).

New System Total = $150 (Summarizer) + $520 (Coordinator) + $301 (Code Deployer) = $971.

Comparison to Nuclear Safety

Nuclear plants achieve ~10⁻⁵ core damage frequency per reactor-year (NRC subsidiary goal: ~10⁻⁴/reactor-year). Using the achieved figure for a $10B damage potential:

Example Delegation Risk = 10⁻⁵ × $10B = $100,000/year

Our research assistant’s $971/month ≈ $12,000/year is roughly comparable—which suggests either:

We’re being appropriately cautious, or
Nuclear plants manage much higher absolute stakes with similar relative risk

This kind of cross-domain comparison helps calibrate whether AI safety investments are proportionate.

Key Topics

Risk Decomposition - Accidents vs. defection: two types of harm
Risk Inheritance - How risk flows through delegation networks
Delegation Interfaces & Contracts - Formalizing delegation relationships
Risk Optimization - Minimizing delegation risk subject to capability
Risk Dynamics - How trust and risk evolve over time
Risk Accounting - Ledgers, auditing, and KPIs
Delegation Protocols - Handshakes, tokens, and revocation
Risk Economics - Insurance, arbitrage, and incentives
Delegation at Scale - Distributed systems and bottlenecks
Human-AI Delegation - Team dynamics and calibration

Applications

Delegation Accounting - Balance sheet view of delegation

The Goal

What’s Next?

To dive deeper into specific topics:

Risk Inheritance — Algorithms for computing how risk flows through delegation networks
Risk Dynamics — How trust evolves, decays, and rebuilds over time

To see delegation risk applied:

Principles to Practice — Delegation Risk calculations for real examples
Risk Budgeting Overview — Cross-domain methods for allocating risk budgets

To understand the foundations:

Background Research — Deep dives into nuclear, aerospace, and financial risk methods

To start implementing:

Quick Start — Step-by-step application checklist
Decision Guide — Choosing implementations based on risk budget

Delegation Risk: Overview

Vision

Why Delegation Risk Matters

Core Concepts

Harm Surface

Delegation Risk

Worked Example: Research Assistant Delegation Risk

Step 1: Identify Harm Modes and Damages

Step 2: Risk Inheritance

Step 3: Risk Budget Allocation

Step 4: After Mitigation

Comparison to Nuclear Safety

Key Topics

Applications

The Goal

What’s Next?

Further Reading

Foundational Concepts

Academic Papers

Delegation Risk: Overview

Vision

Why Delegation Risk Matters

Core Concepts

Harm Surface

Delegation Risk

Worked Example: Research Assistant Delegation Risk

Step 1: Identify Harm Modes and Damages

Step 2: Risk Inheritance

Step 3: Risk Budget Allocation

Step 4: After Mitigation

Comparison to Nuclear Safety

Key Topics

Applications

The Goal

What’s Next?

Further Reading

Foundational Concepts

Academic Papers

Related Domains