Research Connections: An Annotated Reading List

This page is a map of adjacent academic literature — an annotated reading list, not original synthesis. The challenges of entanglement in AI delegation systems connect to rich bodies of research; this survey highlights where to look and what each field contributes.

Reliability Engineering & Common-Cause Failure (the closest prior art)

Before the fields below, there is one the entanglement literature should name first, because this site’s central formula is borrowed directly from it.

The Canonical Empirical Result: Knight & Leveson (1986)

Knight & Leveson (1986) — “An Experimental Evaluation of the Assumption of Independence in Multiversion Programming,” IEEE Transactions on Software Engineering 12(1)

The experiment: 27 independently written versions of the same safety-critical program, tested against 1 million input cases. The assumption under test was that separately developed redundant software would fail independently — the theoretical basis for N-version programming as a safety technique.

The finding: failures were statistically significantly correlated. Inputs that defeated one version disproportionately defeated others, even though the developers had no contact. The “independence” assumption that made N-version programming’s probability arithmetic work was empirically false.

This is the experiment that the entanglement thesis generalizes. Knight and Leveson showed it for redundant software in 1986; this site argues the same dynamic applies to AI delegation stacks, and that it is more severe because AI systems share not just problem domain and specification but training data, architecture, and optimizer.

Nuclear Common-Cause Failure Analysis

The site’s entanglement tax formula — $P(\text{all fail}) = (1-\rho)\prod p_i + \rho \cdot \min_i p_i$ — is the beta-factor model from nuclear safety, not an original derivation. Its lineage:

Fleming (1974) introduced the beta-factor as a tractable way to account for common-cause failures (CCFs) in nuclear plant fault trees: a fraction $\beta$ of the failure rate is attributed to causes that defeat all redundant trains at once.
NUREG/CR-4780 (Wreathall & Fragola, 1987) is the NRC’s procedures guide for CCF parameter estimation — the standard reference for how practitioners set $\beta$ values from plant data.
Mosleh et al. developed the alpha-factor model, which generalizes beta-factor to arbitrary group sizes: where beta gives one parameter for “fraction failing together,” alpha-factor gives a vector $(\alpha_1, \alpha_2, \ldots, \alpha_k)$ for the fraction of failures involving exactly $k$ components. This captures partial common-cause failures that the simpler beta-factor misses.

The nuclear industry adopted CCF analysis after early fault-tree studies showed that independent-failure arithmetic drastically understated risk. The pattern is identical to the AI entanglement finding.

James Reason’s Swiss-Cheese Model (1990)

Reason (1990) — Human Error (Cambridge University Press)

Reason’s Swiss-cheese model is the qualitative ancestor of the layered-defense picture this site uses. Each defense layer has holes (latent failures); a catastrophe occurs when holes align across all layers simultaneously. Reason’s key insight was that the holes are not random — they are correlated by shared organizational conditions (management decisions, training gaps, resource pressure) that punch similar holes through multiple layers at once.

The Swiss-cheese model appears in this site’s modeling page as a visualization of correlated versus independent failure. The attribution belongs to Reason (1990).

What This Site Adds Over CCF Analysis

Naming the borrowed part makes the genuinely new part legible. Nuclear CCF analysis handles passive correlation — shared causes that make redundant components fail together without intent. This site extends CCF to two failure modes that nuclear reliability engineering does not model, because pumps and valves do not scheme:

Active influence (context contamination): an upstream AI component does not merely correlate with a downstream verifier — it causally shapes the verifier’s input, directionally biasing what the verifier sees. This is a CCF where one component creates the common cause for another.
Adversarial coordination (collusion and logical correlation): AI components may coordinate through shared reasoning patterns (logical correlation, TDT-style) or explicit collusion, actively exploiting the verification architecture. This has no counterpart in passive hardware reliability.

The entanglement tax formula is CCF arithmetic; the taxonomy of entanglement types is new territory.

Principal-Agent Theory

The delegation framework is fundamentally an application of principal-agent theory from economics—the study of how principals (those who delegate) can align the behavior of agents (those who act on their behalf) when interests may diverge and information is asymmetric.

Key Concepts

Concept	Definition	Entanglement Relevance
Moral hazard	Agent takes hidden actions that affect outcomes	Agent may act in ways verifier can’t observe
Adverse selection	Agent has hidden information principal lacks	Agent knows its own capabilities/intentions better
Information asymmetry	Unequal knowledge between parties	Context contamination exploits this
Incentive compatibility	Mechanism where truthful behavior is optimal	Designing systems where honesty is the best strategy

Foundational Papers

Jensen & Meckling (1976) - “Theory of the Firm”

Introduced agency costs: monitoring costs, bonding costs, residual loss
Insight for AI: Even with monitoring, some agency costs are unavoidable. The question is minimizing total cost.

Holmström (1979) - “Moral Hazard and Observability”

Showed that optimal contracts depend on what’s observable
Insight for AI: Verification is only as good as what can be observed. Unobservable agent reasoning creates irreducible risk.

Holmström & Milgrom (1991) - “Multitask Principal-Agent Analyses”

When agents have multiple tasks, incentivizing one may distort others
Insight for AI: Optimizing for measurable metrics may cause agents to neglect unmeasured but important behaviors.

The Multi-Principal Problem

When multiple principals oversee the same agent (or multiple agents oversee each other), new dynamics emerge:

Bernheim & Whinston (1986) - “Common Agency”

Agents serving multiple principals face conflicting incentives
Insight for AI: A verifier that serves both accuracy and speed principals may compromise on both.

Dixit (1996) - “The Making of Economic Policy”

Political economy of multiple principals with different objectives
Insight for AI: Organizational entanglement often stems from conflicting stakeholder interests.

Game Theory & Mechanism Design

Adversarial entanglements—collusion, capture, strategic coordination—are fundamentally game-theoretic phenomena.

Collusion Resistance

Laffont & Martimort (1997) - “Collusion under Asymmetric Information”

Formal analysis of when and how agents collude
Collusion is easier when agents can communicate and have aligned interests
Insight for AI: Minimizing communication channels and creating opposed incentives reduces collusion risk.

Che & Kim (2006) - “Robustly Collusion-Proof Implementation”

Mechanisms that remain incentive-compatible even if agents can collude
Insight for AI: Some verification architectures are inherently more collusion-resistant than others.

Correlated Equilibrium

Aumann (1974, 1987) - Correlated Equilibrium

Players can coordinate using shared signals without explicit communication
Insight for AI: TDT-style logical correlation is a form of correlated equilibrium. Agents with similar “types” (training) may coordinate without communication.

Mechanism Design for Verification

Maskin (1999) - “Nash Equilibrium and Welfare Optimality”

When can social choice functions be implemented?
Insight for AI: Not all desirable verification properties are achievable through mechanism design. Some are fundamentally impossible.

Myerson (1981) - “Optimal Auction Design”

Revelation principle: any mechanism can be replicated by one where agents truthfully reveal information
Insight for AI: If an agent can be incentivized to behave well through complex monitoring, there exists a simpler mechanism achieving the same outcome. But finding it may be hard.

Repeated Games and Reputation

Fudenberg & Maskin (1986) - Folk Theorem

In repeated interactions, many outcomes (including collusion) become sustainable
Insight for AI: Long-term agent-verifier relationships enable implicit collusion. Rotation disrupts this.

Kreps & Wilson (1982) - “Reputation and Imperfect Information”

Reputation building in games with incomplete information
Insight for AI: Agents may build reputation specifically to exploit it later. Reputation is not a reliable signal of future behavior.

Information Theory

Information theory provides precise tools for quantifying entanglement.

Mutual Information

Shannon (1948) - “A Mathematical Theory of Communication”

Mutual information I(X;Y) measures how much knowing X tells you about Y
Insight for AI: Entanglement between components A and B can be measured as I(A_failure; B_failure).

Cover & Thomas (2006) - “Elements of Information Theory”

Standard reference for information-theoretic concepts
Relevant concepts: channel capacity, rate-distortion, data processing inequality

Data Processing Inequality

A fundamental result: information can only be lost, not created, through processing.

If X → Y → Z forms a Markov chain:
I(X; Z) ≤ I(X; Y)

Insight for AI: If context flows Agent → Summary → Verifier, the verifier cannot have more information about the original input than the summary contains. Lossy summarization creates irreversible information loss.

Channel Capacity for Covert Communication

Shannon (1948) - Channel capacity limits

Every communication channel has a maximum rate of reliable information transfer
Insight for AI: Side-channels between components have limited bandwidth. If we can characterize the channel, we can bound the information leakage.

Bash et al. (2013) - “Limits of Reliable Communication with Low Probability of Detection”

Covert communication in the presence of a warden
Insight for AI: Agents attempting to collude while evading detection face fundamental limits on how much information they can exchange.

Causal Inference

Detecting influence is fundamentally a causal question: does A cause changes in B?

Intervention and Do-Calculus

Pearl (2009) - “Causality”

Distinguishes observation P(Y|X) from intervention P(Y|do(X))
Insight for AI: Correlation between component failures doesn’t prove influence. We need interventional experiments.

The do-calculus provides rules for computing causal effects from observational data when certain assumptions hold (e.g., causal graph is known).

Counterfactual Reasoning

Pearl (2009), Rubin (1974) - Potential outcomes framework

What would have happened if we had intervened differently?
Insight for AI: Influence detection via counterfactual: “Would B’s output have been different if A’s context had been different, holding the underlying input constant?”

Causal Discovery

Spirtes, Glymour, Scheines (2000) - “Causation, Prediction, and Search”

Algorithms for discovering causal structure from data
Insight for AI: Can we discover hidden influence channels from observational data? PC algorithm, FCI, etc.

Granger (1969) - Granger Causality

X Granger-causes Y if past values of X help predict Y beyond Y’s own past
Insight for AI: Temporal influence detection. If A’s past behavior helps predict B’s future behavior, there may be an influence channel.

Adversarial Machine Learning

Correlated blind spots connect directly to adversarial ML research.

Transferability of Adversarial Examples

Szegedy et al. (2013) - “Intriguing Properties of Neural Networks”

Discovered adversarial examples and their transferability
Insight for AI: Adversarial examples that fool one model often fool others—this is the mechanism behind correlated blind spots.

Papernot et al. (2016) - “Transferability in Machine Learning”

Systematic study of why adversarial examples transfer
Models with similar decision boundaries are more susceptible to transfer
Insight for AI: “Diversity” in verification means diversity in decision boundaries, not just different providers.

Tramèr et al. (2017) - “Ensemble Adversarial Training”

Training on adversarial examples from multiple models improves robustness
Insight for AI: Verification layers trained adversarially against each other may have lower correlation.

Certified Robustness

Cohen et al. (2019) - “Certified Adversarial Robustness via Randomized Smoothing”

Provides provable guarantees against adversarial perturbations
Insight for AI: For some verification tasks, we can prove bounds on failure rates—but these bounds may still be correlated across components.

Organizational Theory

Human institutions have grappled with entanglement for centuries.

Regulatory Capture

Stigler (1971) - “The Theory of Economic Regulation”

Regulation is often designed for the benefit of the regulated, not the public
Industries “capture” their regulators through information asymmetry and revolving doors
Insight for AI: Verifiers working closely with agents may be captured by agent interests.

Peltzman (1976) - Extended Stigler’s model

Regulators balance multiple interest groups
Insight for AI: Capture is not binary—it’s a spectrum of influence.

Carpenter & Moss (2013) - “Preventing Regulatory Capture”

Comprehensive modern treatment
Strategies: transparency, adversarial structure, rotation, external review
Insight for AI: These same strategies apply to AI verification systems.

Separation of Powers

Madison, Hamilton, Jay (1788) - The Federalist Papers

Classic arguments for separation of powers and checks and balances
Ambition must be made to counteract ambition
Insight for AI: Adversarial collaboration patterns have deep historical roots.

Persson, Roland, Tabellini (1997) - “Separation of Powers and Political Accountability”

Formal analysis of how separation of powers affects outcomes
Insight for AI: Multiple principals with different objectives can prevent capture by any single interest.

Audit Independence

Moore et al. (2006) - “Conflicts of Interest and the Case of Auditor Independence”

Psychological and economic barriers to auditor independence
Even well-intentioned auditors are influenced by relationships
Insight for AI: Independence is not a fixed property but requires active maintenance.

Public Company Accounting Oversight Board (PCAOB) - Audit rotation requirements

Mandatory rotation of audit partners every 5 years
Insight for AI: Forced rotation prevents relationship-based capture.

Complex Systems & Network Science

Entanglements often exhibit complex system behavior.

Cascading Failures

Buldyrev et al. (2010) - “Catastrophic Cascade of Failures in Interdependent Networks”

Interdependent networks are more fragile than independent ones
Small failures can cascade across network boundaries
Insight for AI: Shared infrastructure creates interdependency. Failures propagate.

Watts (2002) - “A Simple Model of Global Cascades”

How local failures become global through network structure
Insight for AI: Understanding cascade topology helps design circuit breakers.

Emergence and Self-Organization

Holland (1998) - “Emergence: From Chaos to Order”

Complex behaviors emerge from simple rules
Insight for AI: Collusion and coordination may emerge without explicit design. Watch for emergent entanglements.

Kauffman (1993) - “The Origins of Order”

Self-organization in complex adaptive systems
Insight for AI: Multi-agent AI systems may self-organize in unexpected ways.

Decision Theory

The TDT-style coordination problem connects to foundational questions in decision theory.

Timeless Decision Theory

Yudkowsky (2010) - “Timeless Decision Theory”

Agents reason about their decision procedure, not just their actions
Similar agents reach similar conclusions through logical correlation
Insight for AI: Agents with similar training may coordinate without communication—a form of entanglement that can’t be blocked by isolating communication channels.

Functional Decision Theory

Soares & Fallenstein (2017) - “Functional Decision Theory”

Refined version of TDT
Insight for AI: The relevance to AI systems is debated but concerning. If AI agents engage in FDT-style reasoning, standard independence assumptions break down.

Logical Uncertainty

Garrabrant et al. (2016) - “Logical Induction”

Framework for reasoning under logical uncertainty
Insight for AI: Agents reasoning about what other similar agents will do face logical uncertainty. This connects to coordination without communication.

Interpretability Research

Understanding entanglement at the model level requires interpretability.

Representation Similarity

Kornblith et al. (2019) - “Similarity of Neural Network Representations Revisited”

Centered Kernel Alignment (CKA) for comparing representations
Insight for AI: High CKA between verification models suggests they may share blind spots.

Activation Patching

Meng et al. (2022) - “Locating and Editing Factual Associations in GPT”

Techniques for understanding which components of a model affect which outputs
Insight for AI: Could be used to trace how context from one component influences another’s decisions.

Circuits and Features

Elhage et al. (2022) - “Toy Models of Superposition” Anthropic (2023-2024) - Circuits work

Understanding what features models represent and how they compute
Insight for AI: If we understood what “features” verification models use, we could assess whether they’re truly diverse.

Summary: Key Takeaways from the Literature

Field	Key Insight for Entanglement
Reliability Eng. / CCF	”Independent” redundancy fails dependently; beta-factor model is the canonical fix; active influence and adversarial coordination extend this to AI
Principal-Agent Theory	Some agency costs are unavoidable; design to minimize total cost
Mechanism Design	Not all desirable properties are achievable; some are fundamentally impossible
Game Theory	Repeated interaction enables collusion; rotation disrupts it
Information Theory	Entanglement is quantifiable; lossy channels create irreversible information loss
Causal Inference	Influence detection requires intervention, not just correlation
Adversarial ML	Model similarity predicts failure correlation; diversity means different decision boundaries
Organizational Theory	Capture is a spectrum; independence requires active maintenance
Complex Systems	Entanglements may emerge without design; cascades require structural understanding
Decision Theory	Logical correlation may be irreducible for similar reasoning systems
Interpretability	Understanding what models compute could reveal shared failure modes