voidly

Adaptive Conformal Inference: forecast calibration that updates itself

The forecast model now ships with Adaptive Conformal Inference (ACI) — an online update from Gibbs and Candes 2021 that keeps 90 percent intervals close to nominal under distribution shift. No retraining required, just a daily cron over observed outcomes.

#ml#forecast#calibration#conformal#aci#online-learning#transparency

The treadmill we were on

Every forecast we publish at /v1/forecast/{cc}/7day carries a 90 percent conformal interval. That interval's width is set by a single number: q90, the 90th-percentile absolute residual on a held-out test set. Until today, q90 was fit once at train time and never moved.

Reality is not stationary. On 2026-05-20 the live Brier score we computed from observed outcomes was 0.59, against the in-sample baseline of 0.22 — a brutal drift. We manually refit isotonic calibration the same day, but that fix is a treadmill: every time the world moves, someone has to remember to retrain.

What ACI does

Adaptive Conformal Inference (Gibbs & Candès, arXiv:2106.00170, NeurIPS 2021) is one of the simplest, most-cited online calibration tricks you can layer on a forecast. After each observation, you nudge a target miscoverage parameter alpha:

alpha_t+1 = alpha_t + gamma * (alpha_target - 1{y_t not in interval})

If recent intervals miss more often than the nominal 10 percent, alpha grows and the next interval widens. If they always cover, alpha shrinks and the next interval tightens. Theorem 3.2 in Gibbs & Candès shows this gives marginal coverage close to nominal in the long run with no distributional assumptions. We run with the paper's recommended defaults: alpha_target = 0.10, gamma = 0.01, 90-day rolling window.

Current state (warm-up phase)

After replaying the existing 840 alert-outcome pairs (April 17 to May 14, 2026) through the update rule, ACI converged to alpha = 0.21, raw q = 0.96, empirical coverage = 91.3 percent (target 90 percent). The fact that the raw conformity quantile is near 1.0 is a separate honest signal — the underlying model still under-predicts in the long tail, so a true 90-percent interval would have to span almost the entire [0, 1] range. To keep the published interval informative we cap the halfwidth applied to API responses at 0.35, while the uncapped value (aci.q_raw) is exposed in the /v1/sentinel/health response so anyone auditing us can see the truth.

How it plumbs in

  • scripts/aci-update.py — daily cron at 03:45 UTC, reads sentinel.db, writes ml-deploy/forecast_aci_state.json.
  • sentinel_trust._interval_90() — re-reads the JSON every 5 minutes and applies the (capped) ACI halfwidth instead of the static q90.
  • /v1/forecast/{cc}/7day — new aci_alpha + aci fields. The interval itself is unchanged in shape.
  • The original isotonic calibration is not removed. ACI runs on top of isotonic, adjusting only the interval halfwidth.

Honest caveats

The first ~30 days of ACI will be noisy because we are starting from a static baseline that drifted hard. Expect alpha to bounce a bit before it settles. The published interval is also conservatively capped, so until model recall improves on the rare positive class we are publishing narrower intervals than ACI literally asks for. We chose “informative but slightly under-covered” over “technically correct but useless” — the uncapped truth is one click away in the health endpoint.

Raw data