Skip to content

Research Connections: An Annotated Reading List

This page is a map of adjacent academic literature — an annotated reading list, not original synthesis. The challenges of entanglement in AI delegation systems connect to rich bodies of research; this survey highlights where to look and what each field contributes.


Reliability Engineering & Common-Cause Failure (the closest prior art)

Section titled “Reliability Engineering & Common-Cause Failure (the closest prior art)”

Before the fields below, there is one the entanglement literature should name first, because this site’s central formula is borrowed directly from it.

The Canonical Empirical Result: Knight & Leveson (1986)

Section titled “The Canonical Empirical Result: Knight & Leveson (1986)”

Knight & Leveson (1986) — “An Experimental Evaluation of the Assumption of Independence in Multiversion Programming,” IEEE Transactions on Software Engineering 12(1)

The experiment: 27 independently written versions of the same safety-critical program, tested against 1 million input cases. The assumption under test was that separately developed redundant software would fail independently — the theoretical basis for N-version programming as a safety technique.

The finding: failures were statistically significantly correlated. Inputs that defeated one version disproportionately defeated others, even though the developers had no contact. The “independence” assumption that made N-version programming’s probability arithmetic work was empirically false.

This is the experiment that the entanglement thesis generalizes. Knight and Leveson showed it for redundant software in 1986; this site argues the same dynamic applies to AI delegation stacks, and that it is more severe because AI systems share not just problem domain and specification but training data, architecture, and optimizer.

The site’s entanglement tax formula — P(all fail)=(1ρ)pi+ρminipiP(\text{all fail}) = (1-\rho)\prod p_i + \rho \cdot \min_i p_i — is the beta-factor model from nuclear safety, not an original derivation. Its lineage:

  • Fleming (1974) introduced the beta-factor as a tractable way to account for common-cause failures (CCFs) in nuclear plant fault trees: a fraction β\beta of the failure rate is attributed to causes that defeat all redundant trains at once.
  • NUREG/CR-4780 (Wreathall & Fragola, 1987) is the NRC’s procedures guide for CCF parameter estimation — the standard reference for how practitioners set β\beta values from plant data.
  • Mosleh et al. developed the alpha-factor model, which generalizes beta-factor to arbitrary group sizes: where beta gives one parameter for “fraction failing together,” alpha-factor gives a vector (α1,α2,,αk)(\alpha_1, \alpha_2, \ldots, \alpha_k) for the fraction of failures involving exactly kk components. This captures partial common-cause failures that the simpler beta-factor misses.

The nuclear industry adopted CCF analysis after early fault-tree studies showed that independent-failure arithmetic drastically understated risk. The pattern is identical to the AI entanglement finding.

James Reason’s Swiss-Cheese Model (1990)

Section titled “James Reason’s Swiss-Cheese Model (1990)”

Reason (1990)Human Error (Cambridge University Press)

Reason’s Swiss-cheese model is the qualitative ancestor of the layered-defense picture this site uses. Each defense layer has holes (latent failures); a catastrophe occurs when holes align across all layers simultaneously. Reason’s key insight was that the holes are not random — they are correlated by shared organizational conditions (management decisions, training gaps, resource pressure) that punch similar holes through multiple layers at once.

The Swiss-cheese model appears in this site’s modeling page as a visualization of correlated versus independent failure. The attribution belongs to Reason (1990).

Naming the borrowed part makes the genuinely new part legible. Nuclear CCF analysis handles passive correlation — shared causes that make redundant components fail together without intent. This site extends CCF to two failure modes that nuclear reliability engineering does not model, because pumps and valves do not scheme:

  • Active influence (context contamination): an upstream AI component does not merely correlate with a downstream verifier — it causally shapes the verifier’s input, directionally biasing what the verifier sees. This is a CCF where one component creates the common cause for another.
  • Adversarial coordination (collusion and logical correlation): AI components may coordinate through shared reasoning patterns (logical correlation, TDT-style) or explicit collusion, actively exploiting the verification architecture. This has no counterpart in passive hardware reliability.

The entanglement tax formula is CCF arithmetic; the taxonomy of entanglement types is new territory.


The delegation framework is fundamentally an application of principal-agent theory from economics—the study of how principals (those who delegate) can align the behavior of agents (those who act on their behalf) when interests may diverge and information is asymmetric.

ConceptDefinitionEntanglement Relevance
Moral hazardAgent takes hidden actions that affect outcomesAgent may act in ways verifier can’t observe
Adverse selectionAgent has hidden information principal lacksAgent knows its own capabilities/intentions better
Information asymmetryUnequal knowledge between partiesContext contamination exploits this
Incentive compatibilityMechanism where truthful behavior is optimalDesigning systems where honesty is the best strategy

Jensen & Meckling (1976) - “Theory of the Firm”

  • Introduced agency costs: monitoring costs, bonding costs, residual loss
  • Insight for AI: Even with monitoring, some agency costs are unavoidable. The question is minimizing total cost.

Holmström (1979) - “Moral Hazard and Observability”

  • Showed that optimal contracts depend on what’s observable
  • Insight for AI: Verification is only as good as what can be observed. Unobservable agent reasoning creates irreducible risk.

Holmström & Milgrom (1991) - “Multitask Principal-Agent Analyses”

  • When agents have multiple tasks, incentivizing one may distort others
  • Insight for AI: Optimizing for measurable metrics may cause agents to neglect unmeasured but important behaviors.

When multiple principals oversee the same agent (or multiple agents oversee each other), new dynamics emerge:

Bernheim & Whinston (1986) - “Common Agency”

  • Agents serving multiple principals face conflicting incentives
  • Insight for AI: A verifier that serves both accuracy and speed principals may compromise on both.

Dixit (1996) - “The Making of Economic Policy”

  • Political economy of multiple principals with different objectives
  • Insight for AI: Organizational entanglement often stems from conflicting stakeholder interests.

Adversarial entanglements—collusion, capture, strategic coordination—are fundamentally game-theoretic phenomena.

Laffont & Martimort (1997) - “Collusion under Asymmetric Information”

  • Formal analysis of when and how agents collude
  • Collusion is easier when agents can communicate and have aligned interests
  • Insight for AI: Minimizing communication channels and creating opposed incentives reduces collusion risk.

Che & Kim (2006) - “Robustly Collusion-Proof Implementation”

  • Mechanisms that remain incentive-compatible even if agents can collude
  • Insight for AI: Some verification architectures are inherently more collusion-resistant than others.

Aumann (1974, 1987) - Correlated Equilibrium

  • Players can coordinate using shared signals without explicit communication
  • Insight for AI: TDT-style logical correlation is a form of correlated equilibrium. Agents with similar “types” (training) may coordinate without communication.

Maskin (1999) - “Nash Equilibrium and Welfare Optimality”

  • When can social choice functions be implemented?
  • Insight for AI: Not all desirable verification properties are achievable through mechanism design. Some are fundamentally impossible.

Myerson (1981) - “Optimal Auction Design”

  • Revelation principle: any mechanism can be replicated by one where agents truthfully reveal information
  • Insight for AI: If an agent can be incentivized to behave well through complex monitoring, there exists a simpler mechanism achieving the same outcome. But finding it may be hard.

Fudenberg & Maskin (1986) - Folk Theorem

  • In repeated interactions, many outcomes (including collusion) become sustainable
  • Insight for AI: Long-term agent-verifier relationships enable implicit collusion. Rotation disrupts this.

Kreps & Wilson (1982) - “Reputation and Imperfect Information”

  • Reputation building in games with incomplete information
  • Insight for AI: Agents may build reputation specifically to exploit it later. Reputation is not a reliable signal of future behavior.

Information theory provides precise tools for quantifying entanglement.

Shannon (1948) - “A Mathematical Theory of Communication”

  • Mutual information I(X;Y) measures how much knowing X tells you about Y
  • Insight for AI: Entanglement between components A and B can be measured as I(A_failure; B_failure).

Cover & Thomas (2006) - “Elements of Information Theory”

  • Standard reference for information-theoretic concepts
  • Relevant concepts: channel capacity, rate-distortion, data processing inequality

A fundamental result: information can only be lost, not created, through processing.

If X → Y → Z forms a Markov chain:
I(X; Z) ≤ I(X; Y)

Insight for AI: If context flows Agent → Summary → Verifier, the verifier cannot have more information about the original input than the summary contains. Lossy summarization creates irreversible information loss.

Shannon (1948) - Channel capacity limits

  • Every communication channel has a maximum rate of reliable information transfer
  • Insight for AI: Side-channels between components have limited bandwidth. If we can characterize the channel, we can bound the information leakage.

Bash et al. (2013) - “Limits of Reliable Communication with Low Probability of Detection”

  • Covert communication in the presence of a warden
  • Insight for AI: Agents attempting to collude while evading detection face fundamental limits on how much information they can exchange.

Detecting influence is fundamentally a causal question: does A cause changes in B?

Pearl (2009) - “Causality”

  • Distinguishes observation P(Y|X) from intervention P(Y|do(X))
  • Insight for AI: Correlation between component failures doesn’t prove influence. We need interventional experiments.

The do-calculus provides rules for computing causal effects from observational data when certain assumptions hold (e.g., causal graph is known).

Pearl (2009), Rubin (1974) - Potential outcomes framework

  • What would have happened if we had intervened differently?
  • Insight for AI: Influence detection via counterfactual: “Would B’s output have been different if A’s context had been different, holding the underlying input constant?”

Spirtes, Glymour, Scheines (2000) - “Causation, Prediction, and Search”

  • Algorithms for discovering causal structure from data
  • Insight for AI: Can we discover hidden influence channels from observational data? PC algorithm, FCI, etc.

Granger (1969) - Granger Causality

  • X Granger-causes Y if past values of X help predict Y beyond Y’s own past
  • Insight for AI: Temporal influence detection. If A’s past behavior helps predict B’s future behavior, there may be an influence channel.

Correlated blind spots connect directly to adversarial ML research.

Szegedy et al. (2013) - “Intriguing Properties of Neural Networks”

  • Discovered adversarial examples and their transferability
  • Insight for AI: Adversarial examples that fool one model often fool others—this is the mechanism behind correlated blind spots.

Papernot et al. (2016) - “Transferability in Machine Learning”

  • Systematic study of why adversarial examples transfer
  • Models with similar decision boundaries are more susceptible to transfer
  • Insight for AI: “Diversity” in verification means diversity in decision boundaries, not just different providers.

Tramèr et al. (2017) - “Ensemble Adversarial Training”

  • Training on adversarial examples from multiple models improves robustness
  • Insight for AI: Verification layers trained adversarially against each other may have lower correlation.

Cohen et al. (2019) - “Certified Adversarial Robustness via Randomized Smoothing”

  • Provides provable guarantees against adversarial perturbations
  • Insight for AI: For some verification tasks, we can prove bounds on failure rates—but these bounds may still be correlated across components.

Human institutions have grappled with entanglement for centuries.

Stigler (1971) - “The Theory of Economic Regulation”

  • Regulation is often designed for the benefit of the regulated, not the public
  • Industries “capture” their regulators through information asymmetry and revolving doors
  • Insight for AI: Verifiers working closely with agents may be captured by agent interests.

Peltzman (1976) - Extended Stigler’s model

  • Regulators balance multiple interest groups
  • Insight for AI: Capture is not binary—it’s a spectrum of influence.

Carpenter & Moss (2013) - “Preventing Regulatory Capture”

  • Comprehensive modern treatment
  • Strategies: transparency, adversarial structure, rotation, external review
  • Insight for AI: These same strategies apply to AI verification systems.

Madison, Hamilton, Jay (1788) - The Federalist Papers

  • Classic arguments for separation of powers and checks and balances
  • Ambition must be made to counteract ambition
  • Insight for AI: Adversarial collaboration patterns have deep historical roots.

Persson, Roland, Tabellini (1997) - “Separation of Powers and Political Accountability”

  • Formal analysis of how separation of powers affects outcomes
  • Insight for AI: Multiple principals with different objectives can prevent capture by any single interest.

Moore et al. (2006) - “Conflicts of Interest and the Case of Auditor Independence”

  • Psychological and economic barriers to auditor independence
  • Even well-intentioned auditors are influenced by relationships
  • Insight for AI: Independence is not a fixed property but requires active maintenance.

Public Company Accounting Oversight Board (PCAOB) - Audit rotation requirements

  • Mandatory rotation of audit partners every 5 years
  • Insight for AI: Forced rotation prevents relationship-based capture.

Entanglements often exhibit complex system behavior.

Buldyrev et al. (2010) - “Catastrophic Cascade of Failures in Interdependent Networks”

  • Interdependent networks are more fragile than independent ones
  • Small failures can cascade across network boundaries
  • Insight for AI: Shared infrastructure creates interdependency. Failures propagate.

Watts (2002) - “A Simple Model of Global Cascades”

  • How local failures become global through network structure
  • Insight for AI: Understanding cascade topology helps design circuit breakers.

Holland (1998) - “Emergence: From Chaos to Order”

  • Complex behaviors emerge from simple rules
  • Insight for AI: Collusion and coordination may emerge without explicit design. Watch for emergent entanglements.

Kauffman (1993) - “The Origins of Order”

  • Self-organization in complex adaptive systems
  • Insight for AI: Multi-agent AI systems may self-organize in unexpected ways.

The TDT-style coordination problem connects to foundational questions in decision theory.

Yudkowsky (2010) - “Timeless Decision Theory”

  • Agents reason about their decision procedure, not just their actions
  • Similar agents reach similar conclusions through logical correlation
  • Insight for AI: Agents with similar training may coordinate without communication—a form of entanglement that can’t be blocked by isolating communication channels.

Soares & Fallenstein (2017) - “Functional Decision Theory”

  • Refined version of TDT
  • Insight for AI: The relevance to AI systems is debated but concerning. If AI agents engage in FDT-style reasoning, standard independence assumptions break down.

Garrabrant et al. (2016) - “Logical Induction”

  • Framework for reasoning under logical uncertainty
  • Insight for AI: Agents reasoning about what other similar agents will do face logical uncertainty. This connects to coordination without communication.

Understanding entanglement at the model level requires interpretability.

Kornblith et al. (2019) - “Similarity of Neural Network Representations Revisited”

  • Centered Kernel Alignment (CKA) for comparing representations
  • Insight for AI: High CKA between verification models suggests they may share blind spots.

Meng et al. (2022) - “Locating and Editing Factual Associations in GPT”

  • Techniques for understanding which components of a model affect which outputs
  • Insight for AI: Could be used to trace how context from one component influences another’s decisions.

Elhage et al. (2022) - “Toy Models of Superposition” Anthropic (2023-2024) - Circuits work

  • Understanding what features models represent and how they compute
  • Insight for AI: If we understood what “features” verification models use, we could assess whether they’re truly diverse.

Summary: Key Takeaways from the Literature

Section titled “Summary: Key Takeaways from the Literature”
FieldKey Insight for Entanglement
Reliability Eng. / CCF”Independent” redundancy fails dependently; beta-factor model is the canonical fix; active influence and adversarial coordination extend this to AI
Principal-Agent TheorySome agency costs are unavoidable; design to minimize total cost
Mechanism DesignNot all desirable properties are achievable; some are fundamentally impossible
Game TheoryRepeated interaction enables collusion; rotation disrupts it
Information TheoryEntanglement is quantifiable; lossy channels create irreversible information loss
Causal InferenceInfluence detection requires intervention, not just correlation
Adversarial MLModel similarity predicts failure correlation; diversity means different decision boundaries
Organizational TheoryCapture is a spectrum; independence requires active maintenance
Complex SystemsEntanglements may emerge without design; cascades require structural understanding
Decision TheoryLogical correlation may be irreducible for similar reasoning systems
InterpretabilityUnderstanding what models compute could reveal shared failure modes

  • Laffont & Martimort (2002) - “The Theory of Incentives: The Principal-Agent Model”
  • Cover & Thomas (2006) - “Elements of Information Theory”
  • Pearl (2009) - “Causality: Models, Reasoning, and Inference”
  • Mas-Colell, Whinston, Green (1995) - “Microeconomic Theory” (mechanism design chapters)
  • Bolton & Dewatripont (2005) - “Contract Theory”
  • Chakraborty & Yılmaz (2017) - “Authority, Consensus, and Governance”
  • Goodfellow et al. (2018) - “Making Machine Learning Robust Against Adversarial Inputs”
  • Christiano et al. (2017) - “Deep Reinforcement Learning from Human Feedback”
  • Irving et al. (2018) - “AI Safety via Debate”
  • Hubinger et al. (2019) - “Risks from Learned Optimization”

See also: