Skip to content

Correlation Calculator

This page provides lookup tables for calculating the entanglement tax—the gap between your perceived protection (assuming independent layers) and your actual protection (accounting for correlations). All ρ values are on the phi (failure-indicator Pearson) scale; see What ρ means and how to estimate it for the estimation recipe.


Independent assumption:

P(all fail) = P(L₁ fails) × P(L₂ fails) × P(L₃ fails)

Reality with correlation:

P(all fail) is much higher because when one layer fails, correlated layers are more likely to fail too.

The formula (canonical derivation in Formal Definitions; see also Risk Propagation):

P(all fail)=(1ρ)ipi+ρminipiP(\text{all fail}) = (1-\rho)\prod_i p_i + \rho \cdot \min_i p_i

This is the beta-factor common-cause model: with weight ρ a shared cause defeats every correlated layer at once. It is deliberately conservative at intermediate ρ relative to smoother correlation models — the right direction for budgeting. All tables below are generated from this formula.


Individual layer effectiveness: 90% (failure rate = 10%)

CorrelationP(Both Fail)Effective ProtectionEntanglement Tax
0.0 (independent)1.0%99.0%
0.11.9%98.1%1.9×
0.22.8%97.2%2.8×
0.33.7%96.3%3.7×
0.55.5%94.5%5.5×
0.77.3%92.7%7.3×
1.0 (identical)10.0%90.0%10×

Key insight: Even modest correlation (ρ = 0.3) makes your two-layer system nearly 4× worse than independent.

Individual layer effectiveness: 90% (failure rate = 10%)

CorrelationP(All Fail)Effective ProtectionEntanglement Tax
0.0 (independent)0.10%99.90%
0.11.09%98.91%10.9×
0.22.08%97.92%20.8×
0.33.07%96.93%31×
0.55.05%94.95%50×
0.77.03%92.97%70×
1.0 (identical)10.00%90.00%100×

Key insight: With three layers, even ρ = 0.1 costs you an order of magnitude — and at ρ = 0.3 you’re paying ~31×. The third layer barely moves actual protection once a common cause exists.

Individual layer effectiveness: 90% (failure rate = 10%)

CorrelationP(All Fail)Effective ProtectionEntanglement Tax
0.0 (independent)0.001%99.999%
0.11.00%99.00%~1,000×
0.22.00%98.00%~2,000×
0.33.00%97.00%~3,000×
0.55.00%95.00%~5,000×
0.77.00%93.00%~7,000×
1.0 (identical)10.00%90.00%10,000×

Key insight: Entanglement tax compounds dramatically with layer count — you think you have 99.999% protection, but with any appreciable common cause your actual protection is set by ρ × 10%, not by the five layers. The extra layers improve the perceived number, not the real one.


How many truly independent layers would give you the same protection?

Nominal LayersCorrelationEffective Redundancy
30.03.0 layers
30.31.5 layers
30.51.3 layers
30.71.2 layers
50.05.0 layers
50.31.5 layers
50.51.3 layers

Interpretation: 3 layers with ρ = 0.5 provide only ~1.3 layers worth of protection — and going from 3 layers to 5 at the same correlation buys you nothing. Once a common cause exists, effective redundancy is set by the correlation, not the layer count. Reduce ρ (diversify providers, isolate context) before adding layers.


What correlation values should you expect?

ConfigurationEstimated Correlation
Same model, same provider0.8 - 0.95
Same provider, different models (e.g., GPT-4 vs GPT-3.5)0.5 - 0.7
Different LLM providers (e.g., GPT-4 vs Claude)0.3 - 0.6
Neural network vs rule-based0.1 - 0.3
Neural network vs formal verification0.0 - 0.1
Neural network vs human expert0.2 - 0.4

Setup:

  • Layer 1: GPT-4 (90% effective)
  • Layer 2: Claude (90% effective)
  • Layer 3: Static analysis (90% effective)

Estimated correlations:

  • GPT-4 ↔ Claude: ρ ≈ 0.5
  • GPT-4 ↔ Static analysis: ρ ≈ 0.2
  • Claude ↔ Static analysis: ρ ≈ 0.2
  • Average: ~0.3

Result (from 3-layer table at ρ = 0.3):

  • You thought: 99.9% protection
  • You have: ~96.9% protection (P(all fail) ≈ 3.1%)
  • Entanglement tax: ~31×

Caveat on averaging pairwise ρs: The single shared-ρ model used here cannot represent asymmetric pairwise structure (GPT-4↔Claude at 0.5 is quite different from either’s 0.2 correlation with static analysis). Averaging (0.5 + 0.2 + 0.2) / 3 = 0.3 is a rough approximation that can understate the tax — the high GPT-4↔Claude correlation still creates a strong common-cause channel that averaging dilutes. For a conservative answer with asymmetric architectures, use the max pairwise ρ (here 0.5, giving ~50× tax from the 3-layer table) rather than the average. For a precise answer, model the pairwise matrix explicitly (see the limitation note in Formal Definitions).

Setup:

  • Layer 1: GPT-4 agent
  • Layer 2: GPT-4 safety checker
  • Layer 3: GPT-4 reviewer
  • All 90% effective

Correlation: ρ ≈ 0.9 (same model)

Result (from 3-layer table, interpolating between ρ = 0.7 and ρ = 1.0):

  • You thought: 99.9% protection
  • You have: ~91% protection (P(all fail) ≈ 9%)
  • Entanglement tax: ~90×

Your three layers are worth about 1.05 effective layers — barely more than one.

Setup:

  • Layer 1: LLM (90% effective)
  • Layer 2: Rule-based checker (95% effective)
  • Layer 3: Human review (99% effective for reviewed items)

Average correlation: ~0.15 (paradigm diversity)

Result:

  • Independent assumption: P(all fail) = 0.1 × 0.05 × 0.01 = 0.005%
  • Actual: P(all fail) = 0.85 × 0.005% + 0.15 × 1% ≈ 0.15%
  • Entanglement tax: ~31×

The tax ratio is still large — with heterogeneous layers, the common-cause term ρ · min(pᵢ) dominates the tiny independent product, and the strongest layer (the 99% human review) sets the floor: P(all fail) ≥ ρ × 1%. But compare absolute risk: 0.15% here versus 3.1% for the code review bot in Example 1. Genuine diversity bought a ~20× lower actual failure rate. The remaining risk is almost entirely the shared-cause channel, so the next improvement is reducing ρ further — not adding layers.


With 90%-effective layers, the common-cause floor P(all fail) ≥ ρ × 10% caps your achievable protection no matter how many layers you stack. Working backwards from each target (assuming ≥4 layers, so the independent term is negligible):

StakesTarget ProtectionMax Correlation (90% layers)
Low95%ρ ≤ ~0.5
Medium99%ρ ≤ ~0.1
High99.9%ρ ≤ ~0.01
Critical99.99%Essentially zero (ρ ≤ ~0.001)

If you can’t get ρ that low, the other lever is stronger individual layers: the floor is ρ × min(pᵢ), so improving your best layer lowers it proportionally.

StrategyCorrelation ReductionTrade-off
Different LLM providersρ drops ~0.2Higher complexity
Add rule-based layerρ drops ~0.3Development cost, rigidity
Add formal verificationρ drops ~0.4High cost, limited scope
Add human reviewρ drops ~0.3Latency, cost
Different paradigm entirelyρ drops ~0.5May not exist

  1. The tax is linear in ρ, and the first step is the worst: for 3 layers at 90%, each +0.1 of ρ adds roughly +10× to the tax (about +1 percentage point of P(all fail)). Going from ρ = 0 to ρ = 0.1 alone multiplies the tax by ~11×.

  2. Adding layers has diminishing returns: The n-th correlated layer adds only (1-ρ) × effectiveness of first layer

  3. Paradigm diversity beats provider diversity: Different approaches reduce ρ by ~0.3-0.5; different providers only ~0.1-0.2

  4. Information flow increases correlation: If Layer A’s output influences Layer B, add ~0.1-0.2 to ρ

  5. Same model = almost no redundancy: ρ ≈ 0.9 means your 3 layers are worth ~1.05 layers


Step 1: Count your verification layers

Step 2: Estimate average correlation:

  • All same provider/model? → ρ ≈ 0.8-0.9
  • All LLMs, different providers? → ρ ≈ 0.4-0.6
  • Mix of LLM + rule-based? → ρ ≈ 0.2-0.3
  • Mix of paradigms (neural + rules + formal)? → ρ ≈ 0.1-0.2

Step 3: Look up entanglement tax in tables above

Step 4: Is effective protection sufficient for your stakes?


See also: