The Core Path
This site is large (~160 pages), but its essential argument lives in about a dozen stops. If you read only one path, read this one — it covers the framework’s most distinctive material in order, in roughly 4–6 hours. Pages on this path are marked with a Core badge in the sidebar. Everything else on the site is reference depth, supporting research, or applied variation.
The 13 Stops
Section titled “The 13 Stops”Before you start: skim the Five-Minute Intro and Core Concepts (~25 min) for the vocabulary — especially the canonical definition of Delegation Risk.
-
Risk Decomposition — The framework’s load-bearing distinction: accidents (component fails at its task) versus defection (component works against you). They scale oppositely with capability — smarter systems have fewer accidents but potentially more dangerous defection — which is why “just build better AI” can’t solve both, and why architecture has to.
-
Delegation Accounting — Treats risk like money: every delegation moves exposure onto someone’s balance sheet, complexity carries a tax, and the running example (a $1,000 delivery that escalates to an adversarial agent) shows why financial instruments alone stop working as agents get more capable.
-
Runtime Risk Accounting — The framework’s math moved into the scaffolding layer: risk preflight (a go/no-go estimate before each consequential action), the exposure envelope (a hard capability bound enforced mechanically regardless of intent), and the exposure ledger (a running per-delegate account with mark-to-market re-estimation, margin calls, and reconciliation against real outcomes). This is what it would actually mean to operationalize Delegation Accounting in a live agent harness — and it explains why the preflight estimator must sit below the actor on the verifiability hierarchy, never be the same model.
-
The Insurer’s Dilemma — Why coverage structures break down under moral hazard and adversarial agents: the case for architectural rather than purely financial risk management. This is the pivot point of the whole framework.
-
Entanglements + Types, Challenges, Formal Definitions — The site’s most original contribution. Three 90%-effective safety layers are not 99.9% safe, because components share infrastructure, blind spots, and influence channels (and the gap only grows with layer count). The entanglement tax — actual risk over perceived risk — is often 10–100×. The passive / active / adversarial taxonomy organizes everything that follows.
-
Detecting Influence — Eight practical methods (A/B frame testing, counterfactual intervention, honeypots, rotation, timing analysis, red-team channel enumeration) for finding the entanglements your architecture diagram says don’t exist.
-
Entanglement Worked Examples — Four systems (code review, healthcare, trading, support escalation) diagnosed end-to-end: build the correlation matrix, find the hidden dependency, redesign, verify. The most directly actionable page on the site.
-
Structural Patterns — The eight architectural building blocks (escalation ladders, voting tribunals, capability airlocks, bulkheads…) that bound damage regardless of component behavior.
-
Channel Integrity — The deepest pattern chapter: side-channels, and how components that never communicate can still coordinate through shared reasoning (logical correlation). Where entanglement theory meets decision theory.
-
Composing Patterns — Patterns interact. Four reference stacks for real systems, a compatibility matrix, and the anti-patterns (kitchen-sink, verification theater) that come from composing badly.
-
Worked Examples — The framework applied end-to-end with explicit (illustrative) numbers in four domains: research assistant, code deployment, trading system, healthcare bot. Note how risk budgets shift with stakes across the four.
-
Nuclear Launch Authority, Jury Systems, Criminal Organizations — Three human systems that solved delegation-under-distrust for real, each isomorphic to an AI design question: engineered error asymmetry (juries), deliberate friction and two-person integrity (nuclear), and trust without any external enforcement at all (criminal organizations).
-
Sydney / Bing Chat — The closing real-world failure analysis: what happens when none of the above is in place, and the counterfactual architecture that would have contained it.
After the Core Path: go applied with the Quick Start checklist, go quantitative with Probabilistic Estimation, or go deep with the Research section.
Paths by Goal
Section titled “Paths by Goal”The Core Path is the recommended default. If it’s more than you need right now, these shorter routes target a specific goal. Each is a focused subset — for the fullest picture, the Core Path remains the recommended read.
Building or applying a system. Core Concepts → Quick Start → Design Patterns and Least-X Principles → Entanglements to avoid correlated failures, with the Cost-Benefit Tool for ROI. Applying to an organization rather than an AI system? Start from the human-systems case studies instead. (~2–4 hrs)
Assessing an existing system. Core Concepts → Delegation Risk Overview and Risk Decomposition → Case Studies → Entanglements: Mitigation to fix what you find. (~2–3 hrs)
Skeptical it works. FAQ → Sydney Case Study → Nuclear Safety PRA and Lessons from Failures for fields that already do this. (~2–3 hrs)
Following the math. Core Concepts → Delegation Risk Overview and Walkthrough → Risk Decomposition and Power Dynamics → Experimental Estimates. (~4–6 hrs)
Researching. Core Concepts → all theory sections → Research Index and Potential Projects → Experimental. (10+ hrs)
Time budget
Section titled “Time budget”| If you have… | Read… |
|---|---|
| 30 minutes | Five-Minute Intro + Core Concepts |
| Half a day (4–6 hrs) | The Core Path — the framework’s strongest material, end to end |
| Full day | The Core Path + Research and Experimental |