Following the 2026-05-20 isotonic recalibration, the next priority was making the Sentinel + classifier models auditable. Trust in a ML model isn't just calibration — it's the ability for a journalist or researcher to ask “why does the model say 67% for Iran?” and get an answer.
On 2026-05-21 we shipped six new transparency surfaces in a single session. The full set:
Backend (Vultr, intelligence.voidly.ai:8443)
-
GET /v1/forecast/{cc}/7day— now returnstop_features(top-3 SHAP contributions) plusinterval_90(90% conformal interval). Iran today:block_rate_roll30_mean +0.183 ↑,incident_count_7d +0.091 ↑,month -0.082 ↓, interval [0.82, 0.92]. US today:month -0.048 ↓,recent_shutdown +0.022 ↑, interval [0.04, 0.14]. <20ms warm via permutation explainer on the unwrapped XGBoost. -
GET /v1/classifier/info— v3 GradientBoosting bundle metadata: version, training date, full feature list, LOCO eval breakdown (median F1 0.857, per-country IR AUC 0.953). -
GET /v1/classifier/feature-importance— sorted importance + share + a top3_share metric. v3 distribution: rate_count_interaction 40.6%, measurement_count 21.6%, rate_spike_interaction 11.2%. No single feature dominates — contrast with v2's pathological 85% on the leakycountry_risk_tier. -
GET /v1/sentinel/movers?days=N— biggest forecast deltas vs N days ago. Joins today'ssentinel_forecastssnapshot against the closest snapshot N days back. Returns movers_up + movers_down arrays with prior_risk + today_risk + delta. Query params: days, direction, limit, min_abs.
Frontend
- /atlas/probe-coverage — admits the inside-country probe gap. The 37+ Voidly nodes are mostly in non-censoring countries (US, GB, JP, DE). High-censorship countries (IR, CN, RU, VE, EG, etc.) are verified primarily via OONI volunteer probes + CensoredPlanet remote measurement. Coverage matrix + community recruitment ask.
- /atlas/forecast — global forecast index. All 30 watched countries sorted by 7-day max calibrated risk, with a biggest-movers section (today's surface dominated by the May 20 recalibration jump — will normalize in 3 days).
- /sentinel/backtest — the actual reliability diagram (the predicted-mean vs observed-rate scatter that turns calibration into a picture). Plus a confusion matrix at threshold 0.5 and a per-country backtest table sorted by worst Brier.
- /methodology refresh — replaced the static “country_risk_tier 85%” v2 feature-importance section with live v3 numbers + an amber callout explaining the leakage fix.
Why this matters
A calibrated forecast is necessary but not sufficient. Without SHAP, a journalist asks “why is Ethiopia 88%?” and the best we could answer was “the model said so.” With SHAP, the answer is “30-day rolling block rate is unusually high (+0.18 contribution) and the 7-day incident count is up (+0.09).” That's the difference between black-box intelligence and citable intelligence.
The v3 classifier is on disk but not yet swapped into production prediction paths — that requires feature-vector alignment work which is out of scope for this session. The endpoints deliberately read v3 directly from disk so v3's honest numbers are public before the risky production swap.
What's still on the queue
- v3 classifier promotion into create-voidly-incidents.py — needs feature schema alignment first.
- Per-day SHAP variation on the 7day endpoint — currently same vector across 8 days; daily-event-aware features would change this.
- Model registry page consolidating all 4 deployed models (v2 serving, v3 promoted, forecast, anomaly detector) with linked transparency endpoints.
Reproducibility
Backend patches are in scripts/patch-forecast-shap.py,
scripts/patch-classifier-info-endpoints.py,
scripts/patch-classifier-metrics-schema.py,
scripts/patch-forecast-movers-endpoint.py. Each is
idempotent and writes a backup before modifying production code.
Worker proxy routes added in worker/src/index.ts.