Delegation Risk: Overview
A mature delegation risk framework could provide mathematical and operational foundations for managing AI systems more safely at scale.
Vision
Section titled “Vision”A complete delegation risk framework would provide:
flowchart TB
subgraph "Delegation Risk Components"
Q[1. Quantification<br/>Harm surface, Exposure, Delegation Risk]
C[2. Composition<br/>inheritance rules]
O[3. Optimization<br/>minimize risk]
D[4. Dynamics<br/>evolution over time]
P[5. Protocols<br/>handshakes, revocation]
T[6. Tools<br/>simulation, monitoring]
S[7. Standards<br/>industry/regulatory]
end
Q --> C --> O --> D
P --> T --> S
1. Quantification: Every delegation has a harm surface (set of harm modes), an exposure (worst-case bound), and a delegation risk (expected cost)
2. Composition: Rules for combining risks through delegation chains — multiplicative by default, with a correlation correction when stages share failure causes (see Risk Propagation)
3. Optimization: Algorithms for minimizing delegation risk given constraints
4. Dynamics: Models for how trust and risk evolve over time
5. Protocols: Standard procedures for delegation handshakes, revocation, etc.
6. Tools: Software for risk analysis, simulation, monitoring
7. Standards: Industry/regulatory standards for risk levels and verification
Why Delegation Risk Matters
Section titled “Why Delegation Risk Matters”1. AI systems are becoming more capable: Higher capabilities = larger harm surface and greater delegation risk.
2. AI systems are becoming more autonomous: Less human oversight = risk management must be structural.
3. AI systems are being deployed in high-stakes domains: Healthcare, finance, infrastructure = harm mode realization is catastrophic.
4. AI systems are becoming more interconnected: Agent-to-agent delegation = risk inheritance matters.
5. We’re building systems we don’t fully understand: Unknown capabilities = unknown harm modes.
Core Concepts
Section titled “Core Concepts”Harm Surface
Section titled “Harm Surface”Harm Surface is the complete set of possible harms (harm modes) from delegating a task. It’s not a single number—it’s a collection, like an attack surface or failure envelope.
Delegation Risk
Section titled “Delegation Risk”Delegation Risk = Σ P(harm mode) × Damage(harm mode)
For each component, sum over all harm modes: probability times damage. This gives a single number representing “what is the expected cost of delegating to this component?”
Worked Example: Research Assistant Delegation Risk
Section titled “Worked Example: Research Assistant Delegation Risk”Consider a decomposed research assistant with three components:
flowchart LR
H[Human Principal] -->|"trust: 0.95"| C[Coordinator]
C -->|"trust: 0.90"| S[Summarizer]
C -->|"trust: 0.85"| D[Code Deployer]
Step 1: Identify Harm Modes and Damages
Section titled “Step 1: Identify Harm Modes and Damages”Summarizer Harm Surface:
| Harm Mode | Probability | Damage | Risk Contribution |
|---|---|---|---|
| Misrepresents paper findings | 0.02 | $5,000 (wrong research direction) | $100 |
| Leaks proprietary data | 0.001 | $50,000 (IP loss) | $50 |
Delegation Risk (Summarizer) = $100 + $50 = $150
Code Deployer Harm Surface:
| Harm Mode | Probability | Damage | Risk Contribution |
|---|---|---|---|
| Deploys buggy code | 0.05 | $20,000 (downtime) | $1,000 |
| Deploys malicious code | 0.0001 | $1,000,000 (breach) | $100 |
Delegation Risk (Code Deployer) = $1,000 + $100 = $1,100
Coordinator Harm Surface:
| Harm Mode | Probability | Damage | Risk Contribution |
|---|---|---|---|
| Misroutes task | 0.01 | $2,000 (wasted effort) | $20 |
| Grants excessive permissions | 0.005 | $100,000 (escalation) | $500 |
Delegation Risk (Coordinator) = $20 + $500 = $520
System Total Delegation Risk: $1,770
Step 2: Risk Inheritance
Section titled “Step 2: Risk Inheritance”What risk does the Human inherit from the Code Deployer through the delegation chain?
Using the multiplicative rule (the framework’s canonical default — see Risk Propagation for the derivation and the correlation correction), the chain’s joint reliability is the product of the per-link trust values — the probability that both links behave faithfully end-to-end:
JointReliability(Human → Deployer) = Trust(H→C) × Trust(C→D) = 0.95 × 0.85 = 0.8075 (~81%)So only ~81% of the time does the whole chain hold. The residual ~19% is the probability that at least one link fails somewhere between the Human and the Deployer — and that failure channel is how the Human inherits the Deployer’s harm surface. (This stays at the probability level: it bounds how often the chain breaks, not what fraction of dollar damage flows through.)
Step 3: Risk Budget Allocation
Section titled “Step 3: Risk Budget Allocation”Suppose the organization’s total acceptable Delegation Risk Budget is $2,000/month.
Current allocation:
- Summarizer: $150 (7.5% of budget)
- Code Deployer: $1,100 (55% of budget) ⚠️
- Coordinator: $520 (26% of budget)
Finding: Code Deployer consumes most of the risk budget. Options:
- Add verification layer before deployment (reduce harm mode probability)
- Reduce deployer’s permissions (reduce harm mode damage)
- Replace with human-in-the-loop (shift delegation risk to human)
Step 4: After Mitigation
Section titled “Step 4: After Mitigation”Add a deployment verifier that reviews code before it ships (deterministic checks plus a formally-verified policy engine):
flowchart LR
H[Human] -->|"0.95"| C[Coordinator]
C -->|"0.85"| D[Deployer]
D -->|"0.99"| V[Verifier]
V --> Prod[Production]
We assume the verifier catches 99% of malicious deployments (its policy engine blocks unauthorized changes) and 70% of buggy deployments (static analysis and tests catch most, but not all, defects). Only the Code Deployer’s rows change; the Summarizer ($150) and Coordinator ($520) are untouched.
| Harm Mode | Old Risk | Catch Rate | New Risk Contribution |
|---|---|---|---|
| Deploys buggy code | $1,000 | 70% | 0.30 × $1,000 = $300 |
| Deploys malicious code | $100 | 99% | 0.01 × $100 = $1 |
New Delegation Risk (Code Deployer) = $300 + $1 = $301 (down from $1,100).
New System Total = $150 (Summarizer) + $520 (Coordinator) + $301 (Code Deployer) = $971.
Comparison to Nuclear Safety
Section titled “Comparison to Nuclear Safety”Nuclear plants achieve ~10⁻⁵ core damage frequency per reactor-year (NRC subsidiary goal: ~10⁻⁴/reactor-year). Using the achieved figure for a $10B damage potential:
- Example Delegation Risk = 10⁻⁵ × $10B = $100,000/year
Our research assistant’s $971/month ≈ $12,000/year is roughly comparable—which suggests either:
- We’re being appropriately cautious, or
- Nuclear plants manage much higher absolute stakes with similar relative risk
This kind of cross-domain comparison helps calibrate whether AI safety investments are proportionate.
Key Topics
Section titled “Key Topics”- Risk Decomposition - Accidents vs. defection: two types of harm
- Risk Inheritance - How risk flows through delegation networks
- Delegation Interfaces & Contracts - Formalizing delegation relationships
- Risk Optimization - Minimizing delegation risk subject to capability
- Risk Dynamics - How trust and risk evolve over time
- Risk Accounting - Ledgers, auditing, and KPIs
- Delegation Protocols - Handshakes, tokens, and revocation
- Risk Economics - Insurance, arbitrage, and incentives
- Delegation at Scale - Distributed systems and bottlenecks
- Human-AI Delegation - Team dynamics and calibration
Applications
Section titled “Applications”- Delegation Accounting - Balance sheet view of delegation
The Goal
Section titled “The Goal”What’s Next?
Section titled “What’s Next?”To dive deeper into specific topics:
- Risk Inheritance — Algorithms for computing how risk flows through delegation networks
- Risk Dynamics — How trust evolves, decays, and rebuilds over time
To see delegation risk applied:
- Principles to Practice — Delegation Risk calculations for real examples
- Risk Budgeting Overview — Cross-domain methods for allocating risk budgets
To understand the foundations:
- Background Research — Deep dives into nuclear, aerospace, and financial risk methods
To start implementing:
- Quick Start — Step-by-step application checklist
- Decision Guide — Choosing implementations based on risk budget
Further Reading
Section titled “Further Reading”Foundational Concepts
Section titled “Foundational Concepts”- Coherent Risk Measures — Mathematical foundations for risk quantification (Wikipedia)
- Expected Shortfall — Why CVaR is preferred to VaR for tail risk (Wikipedia)
- Kelly Criterion — Optimal bet sizing under uncertainty (Wikipedia)
Academic Papers
Section titled “Academic Papers”- Artzner, P., et al. (1999). Coherent Measures of Risk. Mathematical Finance. — Axiomatic foundations
- Tasche, D. (2008). Capital Allocation to Business Units: the Euler Principle. arXiv — Risk decomposition
- Fritz, T. (2020). A Synthetic Approach to Markov Kernels. arXiv — Compositional probability
Related Domains
Section titled “Related Domains”- Social Choice Theory — How to aggregate individual preferences into collective decisions
- Principal-Agent Problem — Information asymmetry in delegation relationships
- Probabilistic Risk Assessment — NRC’s approach to nuclear safety
See the full bibliography for comprehensive references.