Skip to content

Common Mistakes

These are anticipated failure modes when implementing delegation risk frameworks — patterns worth watching for, drawn from analogous engineering and safety disciplines.


Mistake 1: Jumping to Patterns Without Theory

Section titled “Mistake 1: Jumping to Patterns Without Theory”

Team reads the Design Patterns section, picks some patterns that sound good, implements them without understanding the underlying theory.

  • Patterns feel actionable; theory feels academic
  • Time pressure to ship
  • “We just want the solution, not the lecture”

Patterns are solutions to specific problems. Without understanding the problem space, you’ll:

  • Apply patterns to wrong situations
  • Miss edge cases the pattern was designed for
  • Fail to recognize when patterns conflict

Minimum reading before implementing patterns:

  1. Core Concepts — 20 min
  2. Delegation Risk Overview — 15 min
  3. Then Design Patterns Index — pick patterns

Mistake 2: Assuming Independence (The Entanglement Trap)

Section titled “Mistake 2: Assuming Independence (The Entanglement Trap)”

“We have three verification layers that each catch 90% of issues, so we only miss 0.1% (0.1 × 0.1 × 0.1).”

  • Independence is mathematically convenient
  • Hard to see correlations between components
  • Optimistic about architectural isolation

If verifiers share:

  • Same training data → same blind spots
  • Same base model → correlated failures
  • Shared context → can be attacked together

Three 90% verifiers sharing a provider (ρ ≈ 0.5) might only give you ~95% overall, not 99.9% — a ~50× entanglement tax.

  • Check for shared ancestors — Do components use the same base model?
  • Diversity by design — Use different architectures, training data, paradigms
  • Test for correlation — Do failures happen together?
  • Read Entanglements if your system has multiple AI components

Mistake 3: Cargo-Culting Without Threat Models

Section titled “Mistake 3: Cargo-Culting Without Threat Models”

“Big Company X uses this pattern, so we should too.”

  • Easier to copy than to think
  • Prestigious examples feel safe
  • Threat modeling is hard

Patterns address specific threats. Big Company X may have:

  • Different threat model than you
  • More resources for overhead
  • Regulatory requirements you don’t have
  • Problems you don’t have

Before implementing any pattern:

  1. What harm am I preventing?
  2. How likely is this harm for my system?
  3. What’s the cost of this pattern?
  4. Is the cost justified by the risk reduction?

Mistake 4: Over-Engineering Low-Risk Systems

Section titled “Mistake 4: Over-Engineering Low-Risk Systems”

Internal productivity tool with no customer exposure gets the same security architecture as the payment system.

  • Fear of getting blamed for incidents
  • “Better safe than sorry” mentality
  • Unclear risk assessment
  • Slows development for no benefit
  • Creates resentment (“safety theater”)
  • Wastes resources that could protect high-risk systems
  • Makes the framework seem unreasonable

Use tiered risk levels:

Risk LevelExamplesSuggested Controls
LowInternal tools, experiments5 essential patterns
MediumCustomer-facing, B2B+ Human review, tripwires
HighFinancial, health, safety+ Formal verification, defense in depth

Mistake 5: Under-Engineering High-Risk Systems

Section titled “Mistake 5: Under-Engineering High-Risk Systems”

“It’s just an AI chatbot, what could go wrong?”

Then: reputation damage, data leaks, regulatory fines, or worse.

  • Underestimating AI capability
  • Not imagining adversarial users
  • “We’ll add safety later” (you won’t)
  • Optimism bias

AI systems can:

  • Leak training data (PII, proprietary info)
  • Generate harmful content at scale
  • Be manipulated by adversarial prompts
  • Take unexpected actions with real consequences

Ask the “Sydney Question”: If this system went maximally adversarial for 5 minutes, what’s the worst that happens?

If the answer is scary, you need serious controls.


Mistake 6: Ignoring the Insurer’s Dilemma

Section titled “Mistake 6: Ignoring the Insurer’s Dilemma”

Setting risk budgets without understanding who bears the risk.

  • Risk budgets feel abstract
  • Unclear accountability
  • “Someone else’s problem”

If the development team sets the risk budget but users bear the harm, incentives are misaligned. Team will systematically underestimate acceptable risk.

  • Make risk-bearers set budgets — Users, not developers
  • Skin in the game — Team compensation tied to safety outcomes
  • Independent audit — Third party reviews risk assessments
  • Read The Insurer’s Dilemma

All safety checks depend on one component. If it fails, everything fails.

  • Simpler architecture
  • One component to maintain
  • “It works, don’t touch it”

Single points of failure are:

  • More likely to be compromised by adversaries
  • More likely to fail under unusual conditions
  • Harder to recover from
  • Defense in depth — Multiple independent layers
  • Diversity — Different approaches to the same goal
  • Graceful degradation — System remains partially safe if one layer fails

Implement safety measures at launch, never revisit them.

  • Safety feels “done”
  • No ongoing budget for safety
  • Other priorities emerge
  • Capabilities drift — Models get better, old controls may be insufficient
  • Attacks evolve — Adversaries find new bypasses
  • Context changes — Use cases expand beyond original scope
  • Normalization of deviance — Small violations compound
  • Scheduled reviews — Quarterly risk budget reviews
  • Continuous monitoring — Track failure rates over time
  • Trigger-based reviews — After incidents, capability upgrades, use case changes

Meta-Mistake: Thinking Safety Competes with Capability

Section titled “Meta-Mistake: Thinking Safety Competes with Capability”

Good safety architecture often improves capability:

  • Better monitoring → faster debugging
  • Modular systems → easier upgrades
  • Human escalation → better decisions
  • Audit trails → easier compliance

The real tradeoff is usually between short-term velocity and long-term robustness.


Some failures happen before you write any code — in how you read the framework itself. (The biggest one — jumping straight to patterns — is Mistake 1 above.)

  • Skipping Entanglements. You assume your verification layers are independent, and correlated failures sail past all of them. Read Entanglements before finalizing an architecture.
  • Starting with Research. You get lost in theory with nothing to anchor it to. Read the theory and application sections first, then the Research section.
  • Reading only Case Studies. You absorb specific examples but not the general principles, so you can’t transfer them to your own system. Use case studies to reinforce theory, not replace it.