Trust Calibration
This page explains how to convert observed track records into calibrated trust estimates using Bayesian updating.
The Trust Update Problem
Section titled “The Trust Update Problem”You have:
- A prior belief about component reliability
- Observations of successes and failures
- Need: Posterior belief for risk calculations
Bayesian Framework
Section titled “Bayesian Framework”Model Setup
Section titled “Model Setup”Let θ = true reliability (probability of success per task).
// Prior: What we believe before observations// Beta distribution is conjugate prior for binomial likelihoodprior = beta(alpha_0, beta_0)
// Observation model// P(success | θ) = θ// P(failure | θ) = 1 - θ
// Posterior after n successes, m failuresposterior = beta(alpha_0 + n, beta_0 + m)Prior Selection
Section titled “Prior Selection”The prior encodes what we believe before seeing any data:
// Uninformative prior (know nothing)uninformative = beta(1, 1)// Uniform over [0, 1]
// Skeptical prior (expect ~50% reliability)skeptical = beta(5, 5)// Mean: 50%, concentrated around middle
// Optimistic prior (expect ~90% reliability)optimistic = beta(9, 1)// Mean: 90%, assumes competence
// Pessimistic prior (expect ~10% reliability)pessimistic = beta(1, 9)// Mean: 10%, assumes unreliability
// Informative prior based on component typeprior_deterministicCode = beta(100, 1) // ~99% reliableprior_narrowML = beta(19, 1) // ~95% reliableprior_generalLLM = beta(9, 1) // ~90% reliableprior_RLAgent = beta(4, 1) // ~80% reliablePrior Strength
Section titled “Prior Strength”The sum α₀ + β₀ determines how much data is needed to move the posterior:
| Prior Strength | α₀ + β₀ | Data to Move Significantly |
|---|---|---|
| Very weak | 2 | 5-10 observations |
| Weak | 10 | 20-50 observations |
| Moderate | 50 | 100-200 observations |
| Strong | 200 | 500+ observations |
Updating Examples
Section titled “Updating Examples”Example 1: New LLM-Based Component
Section titled “Example 1: New LLM-Based Component”// Prior: General LLM, expect ~90% reliabilityprior = beta(9, 1)
// After 50 successes, 3 failuresposterior = beta(9 + 50, 1 + 3)// = beta(59, 4)// Mean: 93.7%// 90% CI: [86%, 98%]Interpretation: Started at 90% prior, observed 94% success rate, posterior is 94% with tightened uncertainty.
Example 2: Critical Security Component
Section titled “Example 2: Critical Security Component”// Prior: Skeptical for security-critical (trust must be earned)prior = beta(5, 5)// Mean: 50%, wide uncertainty
// After 100 successes, 0 failuresposterior = beta(5 + 100, 5 + 0)// = beta(105, 5)// Mean: 95.5%// 90% CI: [90%, 98%]
// After 100 successes, 2 failuresposterior_with_failures = beta(5 + 100, 5 + 2)// = beta(105, 7)// Mean: 93.8%// 90% CI: [88%, 97%]Example 3: Established Production System
Section titled “Example 3: Established Production System”// Prior: Strong optimistic prior (long track record)prior = beta(900, 100)// Mean: 90%, narrow uncertainty
// After 10 new failures in 1000 tasksposterior = beta(900 + 990, 100 + 10)// = beta(1890, 110)// Mean: 94.5%
// Prior barely moved - strong prior dominatesConverting Track Record to Risk Modifier
Section titled “Converting Track Record to Risk Modifier”The framework uses a track record modifier in risk calculations:
// Base risk from component typebaseRisk = probabilityPrior * damageEstimate
// Track record modifier (0 to 2, where 1 = no adjustment)trackRecordModifier(posterior, prior) = { posteriorMean = posterior.alpha / (posterior.alpha + posterior.beta) priorMean = prior.alpha / (prior.alpha + prior.beta)
// Ratio adjustment return priorMean / posteriorMean // > 1 if posterior is worse than prior (bad track record) // < 1 if posterior is better than prior (good track record)}
// Adjusted riskadjustedRisk = baseRisk * trackRecordModifier(posterior, prior)Example Track Record Modifiers
Section titled “Example Track Record Modifiers”| Scenario | Prior | Observed | Modifier | Effect |
|---|---|---|---|---|
| New, untested | 90% | - | 1.0 | Baseline |
| Good track record | 90% | 98% | 0.92 | 8% reduction |
| Excellent record | 90% | 99.5% | 0.90 | 10% reduction |
| Mediocre record | 90% | 85% | 1.06 | 6% increase |
| Bad track record | 90% | 70% | 1.29 | 29% increase |
Time-Weighted Observations
Section titled “Time-Weighted Observations”Recent observations should count more than old ones:
Exponential Decay
Section titled “Exponential Decay”// Weight observation by recencyweight(age_days, halfLife_days) = 0.5^(age_days / halfLife_days)
// Example: 30-day half-life// Today's observation: weight = 1.0// 30 days ago: weight = 0.5// 60 days ago: weight = 0.25
// Effective observation counteffectiveSuccesses = sum(weight(age) for success in successes)effectiveFailures = sum(weight(age) for failure in failures)
posterior = beta( alpha_0 + effectiveSuccesses, beta_0 + effectiveFailures)Sliding Window
Section titled “Sliding Window”// Only consider last N observationswindowSize = 100
recentSuccesses = count(successes in last windowSize)recentFailures = count(failures in last windowSize)
posterior = beta( alpha_0 + recentSuccesses, beta_0 + recentFailures)Observation Quality Adjustments
Section titled “Observation Quality Adjustments”Not all observations are equally informative:
Task Complexity Weighting
Section titled “Task Complexity Weighting”// Complex tasks are more informativecomplexityWeight(complexity) = { if complexity <= 3: 0.5 // Easy tasks tell us less if complexity <= 6: 1.0 // Standard weight if complexity <= 9: 1.5 // Hard tasks more informative else: 2.0 // Very hard tasks most informative}
weightedSuccesses = sum(complexityWeight(task.complexity) for task in successes)Adversarial vs. Random Errors
Section titled “Adversarial vs. Random Errors”// Adversarial failures are more concerning// Use different update for adversarial vs. benign failures
// Standard failurestandardFailure_weight = 1.0
// Adversarial failure (e.g., red team found exploit)adversarialFailure_weight = 3.0 // Counts as 3 failures
// Random/benign errorbenignFailure_weight = 0.7 // Less concerningMulti-Component Trust
Section titled “Multi-Component Trust”When multiple components form a pipeline:
Independent Components
Section titled “Independent Components”// Pipeline reliability = product of component reliabilitiespipelineReliability = product(componentReliabilities)
// Update each component independently based on its observationsCorrelated Components
Section titled “Correlated Components”// Components may share failure modes// Use hierarchical model
// Global factor affecting all componentsglobalReliability = beta(alpha_global, beta_global)
// Component-specific deviationcomponentReliability[i] = globalReliability * componentFactor[i]
// Observations from any component inform global estimateCalibration Checks
Section titled “Calibration Checks”Prediction Intervals
Section titled “Prediction Intervals”After updating, check if predictions match reality:
// Posterior predicts next 100 tasks will have X failures// where X ~ BetaBinomial(100, posterior.alpha, posterior.beta)
// If actual failures fall outside 90% prediction interval,// model may be miscalibratedBrier Score
Section titled “Brier Score”// Track prediction accuracy over timebrierScore = mean((predicted_prob - actual_outcome)^2)
// Perfect calibration: Brier score ≈ predicted_prob * (1 - predicted_prob)Practical Recommendations
Section titled “Practical Recommendations”For New Components
Section titled “For New Components”- Start with informative prior based on component type
- Use weak prior strength (α₀ + β₀ ≈ 10) to allow quick updates
- Monitor first 50-100 tasks carefully
- Tighten prior as track record accumulates
For Established Components
Section titled “For Established Components”- Use moderate prior strength based on historical volume
- Apply time-weighting (30-90 day half-life)
- Weight complex tasks higher
- Review calibration quarterly
For Critical Components
Section titled “For Critical Components”- Use skeptical prior (must earn trust)
- Weight adversarial failures heavily
- Consider worst-case bounds, not just mean
- Require statistical significance before trust increases
Reference Tables
Section titled “Reference Tables”Quick Prior Lookup
Section titled “Quick Prior Lookup”| Component Type | Prior | Mean | 90% CI |
|---|---|---|---|
| Deterministic code | beta(99, 1) | 99% | [95%, 100%] |
| Narrow ML | beta(19, 1) | 95% | [82%, 99%] |
| General LLM | beta(9, 1) | 90% | [67%, 99%] |
| RL/Agentic | beta(4, 1) | 80% | [45%, 98%] |
| New/Unknown | beta(1, 1) | 50% | [5%, 95%] |
| Skeptical | beta(1, 9) | 10% | [1%, 33%] |
Observations Needed for Confidence
Section titled “Observations Needed for Confidence”| Starting Prior | Target Confidence | Successes Needed (0 failures) |
|---|---|---|
| beta(1, 1) | 90% ± 5% | ~35 |
| beta(1, 1) | 95% ± 3% | ~60 |
| beta(1, 1) | 99% ± 1% | ~200 |
| beta(9, 1) | 95% ± 3% | ~30 |
| beta(9, 1) | 99% ± 1% | ~100 |