Skip to content

Delegation Risk

Delegation engineering — treating the risk of delegation to people, software, and AI agents as a quantity you can estimate before acting, budget across a system, and architect to minimize.

Every delegation involves risk. When you delegate a task—to an employee, a contractor, a software system, or an AI agent—you’re accepting potential downside in exchange for capability you don’t have or can’t apply yourself.

This site treats that downside as a quantity you can account for. Imagine a coding agent that begins every consequential action with “let’s estimate the expected damages of doing this with strategy X”—and adjusts the strategy to reduce them. Estimate before acting (risk preflight), keep hard limits that don’t depend on the agent’s intent (exposure envelopes), track the running position (the exposure ledger), and architect the whole system—task decomposition, Least-X constraints, verification stacks—to get the most capability per unit of exposure.

The hardest part is that the safeguards you stack are not independent. The standard mistake is to believe the multiplication:

Put three independent 90%-effective safety layers in front of a risky action and naive arithmetic promises 99.9% protection. But layers built on the same model, the same training data, or the same context don’t fail independently. At the correlations realistic for same-provider stacks (ρ ≈ 0.5), three such layers deliver closer to 95% protection — residual risk ~50× what you computed. That gap, typically 10–100× at realistic correlations, is the entanglement tax.

That correlation problem—entanglement—is where naive risk accounting fails, and pricing it is this framework’s most distinctive contribution. The site gives you the quantity (Delegation Risk), the correction (the entanglement tax), and the design space (patterns)—for AI agents that plan, execute, and delegate to other AI systems, and for the older delegation problems they inherit.


This framework proposes structural constraints as the foundation for managing delegation risk. Rather than relying solely on selecting trustworthy delegates, overseeing every action, or detecting problems after the fact, we focus on architectural properties that bound potential harm regardless of delegate behavior.

Risk as a Resource

Risk can be measured, budgeted, and optimized—just like compute or money. Delegation Risk quantifies what you’re betting on each delegate; Runtime Accounting sketches scaffolding that does this natively.

Correlation Is the Enemy

Redundancy only works if failures are independent. Entanglements shows how shared providers, training data, and context quietly destroy your safety margins—and how to detect and price it.

Containment via Decomposition

Instead of one powerful delegate, decompose tasks across many limited components. No single component has enough capability, context, or connectivity to cause catastrophic harm.

Principles that Bound Behavior

The “Least X” principles—least privilege, least capability, least context—systematically limit what each component can do.


While the framework applies generally, AI systems are our primary focus. Capabilities are expanding faster than verification; agents delegate to other agents in networks nobody fully maps; and oversight tends to shrink as autonomy grows. None of these claims are certain, but if even some hold, infrastructure for managing Delegation Risk seems valuable. The aim is not to solve alignment—it’s to bound the damage from any single component, and to stop safety architectures from quietly promising more than they deliver.

This work is complementary to AI Control (Redwood Research), which designs protocols assuming the model is actively scheming. Control supplies the adversarial methodology for that defection channel; delegation engineering is the accounting and architecture layer those protocols would be budgeted and deployed in—and it also prices the accident channel that Control deliberately sets aside. See Related Approaches for the full positioning.


The 5-Minute Introduction

The whole framework on one page: decompose, budget, verify. Start here →

The Core Path

The site’s strongest material in ~13 stops, ordered—entanglements, risk decomposition, channel integrity, worked examples. About 4–6 hours end to end. Follow the path →


Entanglements

The site’s most distinctive contribution: why correlated safeguards fail together, how to detect hidden coupling, and what it costs. Explore →

Delegation Risk

The quantitative foundation: harm modes, expected cost, the canonical propagation rule, risk budgets. Explore →

Design Patterns

45 patterns for building safer delegation systems, organized by threat model, with four worked examples. Explore →

Case Studies

Real incidents and illustrative scenarios across AI and human systems—nuclear launch authority, juries, Sydney. Explore →

Power Dynamics

Formalizing agency and power: why strong tools with weak agency may be the safest quadrant. Explore →

Cross-Domain Methods

Proven machinery from finance (Euler allocation), nuclear safety (fault trees), and mechanism design. Explore →

Research

Literature reviews and theory: scheming reduction evidence, trust propagation, risk measurement and pricing. Explore →

Experimental

Probabilistic estimation tools and the calibration agenda—the framework’s path from vocabulary to method. Explore →


Worked examples—one real incident, three illustrative scenarios:

  • Sydney (real incident) — What happens when constraints are missing
  • Code Review Bot (illustrative) — A layered-verification architecture that works
  • Support Bot (illustrative) — Verification catching an expensive error before it ships
  • Content Moderator (illustrative) — How small changes compound into big problems

  • Risk managers thinking about delegation in any domain
  • AI safety researchers working on scalable containment approaches
  • ML engineers building agentic systems with principled constraints
  • Organizations deploying AI that need risk management frameworks
  • Policy makers looking for concrete technical approaches