voidly

Six new ML transparency surfaces shipped in one session

Every Sentinel forecast now ships with SHAP contributions + a conformal interval. The v3 classifier has public feature-importance and metadata endpoints. /sentinel/backtest renders the reliability diagram, /atlas/forecast lists every watched country, and /v1/sentinel/movers surfaces 7-day deltas.

#methodology#ml#transparency#shap#classifier#forecast

Following the 2026-05-20 isotonic recalibration, the next priority was making the Sentinel + classifier models auditable. Trust in a ML model isn't just calibration — it's the ability for a journalist or researcher to ask “why does the model say 67% for Iran?” and get an answer.

On 2026-05-21 we shipped six new transparency surfaces in a single session. The full set:

Backend (Vultr, intelligence.voidly.ai:8443)

  • GET /v1/forecast/{cc}/7day — now returns top_features (top-3 SHAP contributions) plus interval_90 (90% conformal interval). Iran today: block_rate_roll30_mean +0.183 ↑, incident_count_7d +0.091 ↑, month -0.082 ↓, interval [0.82, 0.92]. US today: month -0.048 ↓, recent_shutdown +0.022 ↑, interval [0.04, 0.14]. <20ms warm via permutation explainer on the unwrapped XGBoost.
  • GET /v1/classifier/info — v3 GradientBoosting bundle metadata: version, training date, full feature list, LOCO eval breakdown (median F1 0.857, per-country IR AUC 0.953).
  • GET /v1/classifier/feature-importance — sorted importance + share + a top3_share metric. v3 distribution: rate_count_interaction 40.6%, measurement_count 21.6%, rate_spike_interaction 11.2%. No single feature dominates — contrast with v2's pathological 85% on the leaky country_risk_tier.
  • GET /v1/sentinel/movers?days=N — biggest forecast deltas vs N days ago. Joins today's sentinel_forecasts snapshot against the closest snapshot N days back. Returns movers_up + movers_down arrays with prior_risk + today_risk + delta. Query params: days, direction, limit, min_abs.

Frontend

  • /atlas/probe-coverage — admits the inside-country probe gap. The 37+ Voidly nodes are mostly in non-censoring countries (US, GB, JP, DE). High-censorship countries (IR, CN, RU, VE, EG, etc.) are verified primarily via OONI volunteer probes + CensoredPlanet remote measurement. Coverage matrix + community recruitment ask.
  • /atlas/forecast — global forecast index. All 30 watched countries sorted by 7-day max calibrated risk, with a biggest-movers section (today's surface dominated by the May 20 recalibration jump — will normalize in 3 days).
  • /sentinel/backtest — the actual reliability diagram (the predicted-mean vs observed-rate scatter that turns calibration into a picture). Plus a confusion matrix at threshold 0.5 and a per-country backtest table sorted by worst Brier.
  • /methodology refresh — replaced the static “country_risk_tier 85%” v2 feature-importance section with live v3 numbers + an amber callout explaining the leakage fix.

Why this matters

A calibrated forecast is necessary but not sufficient. Without SHAP, a journalist asks “why is Ethiopia 88%?” and the best we could answer was “the model said so.” With SHAP, the answer is “30-day rolling block rate is unusually high (+0.18 contribution) and the 7-day incident count is up (+0.09).” That's the difference between black-box intelligence and citable intelligence.

The v3 classifier is on disk but not yet swapped into production prediction paths — that requires feature-vector alignment work which is out of scope for this session. The endpoints deliberately read v3 directly from disk so v3's honest numbers are public before the risky production swap.

What's still on the queue

  • v3 classifier promotion into create-voidly-incidents.py — needs feature schema alignment first.
  • Per-day SHAP variation on the 7day endpoint — currently same vector across 8 days; daily-event-aware features would change this.
  • Model registry page consolidating all 4 deployed models (v2 serving, v3 promoted, forecast, anomaly detector) with linked transparency endpoints.

Reproducibility

Backend patches are in scripts/patch-forecast-shap.py, scripts/patch-classifier-info-endpoints.py, scripts/patch-classifier-metrics-schema.py, scripts/patch-forecast-movers-endpoint.py. Each is idempotent and writes a backup before modifying production code. Worker proxy routes added in worker/src/index.ts.

Raw data