Capability Formalization
This section formalizes the positive side of the optimization problem: what we’re trying to maximize.
| Page | Question |
|---|---|
| Agents, Power, and Authority | What makes something an agent? How do we measure power? |
| Worked Examples | What do these metrics look like for real systems? |
| The Strong Tools Hypothesis | Can we get high capability with low agency? |
Key Concepts
Section titled “Key Concepts”- Agency Score: How well a system’s behavior fits a simple utility function (0 = tool, 1 = optimizer)
- Power Score: Ability to achieve diverse goals
- RACAP: Risk-Adjusted Capability = Capability / Risk
The Core Insight
Section titled “The Core Insight”We want AI systems that are maximally capable while minimally risky. This may be achievable through “strong tools”—high power with low agency.
See The Strong Tools Hypothesis for analysis.
The Bridge to Delegation Risk
Section titled “The Bridge to Delegation Risk”Earlier versions of this site presented Power and Agency alongside the risk accounting without saying how they connect. Here is the bridge — each score moves a specific term of the core formula.
Recall Delegation Risk = Σ P(harm) × Damage, and that Risk Decomposition splits P(harm) into two channels: accidents (the component fails at its task) and defection (the component works against you).
Agency gates the defection channel. Defection is goal-directed behavior against the principal’s interests — it requires something that behaves like an optimizer. A system with Agency ≈ 0 (a calculator, a compiler) has no defection term: its entire risk budget is accident risk, which is exactly why we trust powerful tools that we’d never trust as agents. As the Agency Score rises, P(defection) becomes nonzero and grows with it.
Power bounds the damage terms — in both channels. A component’s worst-case Damage is capped by what it can actually reach and affect (the resources, capabilities, influence, and optionality dimensions of the Power Score). High power raises the ceiling on accident damage and defection damage; low power caps both regardless of intent.
Putting the two together: defection risk requires both high agency and high power — agency to want it, power to do damage with it. That is the Strong Tools Hypothesis restated in risk-accounting terms: pushing Power up while holding Agency near zero grows capability while keeping the defection term pinned at zero, leaving only accident risk — which verification patterns handle far more reliably than they handle adversaries. RACAP (Capability / Risk) is the ratio this trade optimizes.