EG UZ PK RU NG IR TM BR CU MM2026-05-21

Alert lead-time retrospective: did Sentinel actually warn early? (the accountability number)

Voidly Sentinel fires a forecast_threshold alert when a country's 7-day censorship-risk forecast crosses the alert threshold, and the pitch is "early warning." This finding is the honest audit of that pitch. For every forecast-threshold alert that fired in the last 90 days — 154 alerts across 30 countries — it measures whether a confirmed censorship/mixed incident actually followed within 14 days (a true positive), or never came (a false alarm), and if it did, how many days of lead time the alert gave (incident first_seen minus alert issued_at). The headline numbers, deliberately unsmoothed: 30 true positives (true-positive rate 19.5%), 122 false alarms (false-alarm rate 79.2%), 2 lagging alerts where the forecast reacted to a shutdown already underway. For the 30 true positives the lead-time distribution is median 4.2 days, mean 5.8 days, IQR 2.2-8.9 days, full range 0.9-13.9 days. The 79.2% false-alarm rate is high and the endpoint says so prominently in a headline_warning field rather than burying it — a single Sentinel alert is a watch signal, not a prediction; the genuine early-warning value is in the aggregate lead-time distribution. The per-country split is the real story: where Sentinel works it works (Egypt 3/3 true positives, 100%, median lead 10.9 days; Uzbekistan 7/7, 100%, 1.9 days; Pakistan 5/6, 83%, 2.9 days) and where it does not it does not (Iran 0/7, 100% false alarms — the worst country; Turkmenistan, Brazil, Cuba, Myanmar each 0-for-7). Honest caveats baked into every response: (1) "lead time" is alert-issued vs incident-DETECTION, not alert vs the real-world shutdown start — incident detection itself lags (OONI/IODA ingest + the 30-min incident builder) so a positive lead time is a LOWER bound on the true early-warning margin, it does not over-state; (2) the matched incident is a TEMPORAL match (next confirmed incident within the horizon), not a causal one — the alert did not necessarily predict that specific incident; (3) the 14-day horizon means a real incident the alert correctly anticipated but that arrived on day 16 is scored as a false alarm here, so 79.2% is an UPPER bound on true mis-fires; (4) only confirmed censorship/mixed incidents count — IODA disruption rows (real outages never confirmed as censorship) are excluded, which raises the measured false-alarm rate; (5) lagging alerts are reported separately and excluded from the median-lead-time headline; (6) countries with <2 alerts are excluded from best/worst ranking. Live at GET /v1/sentinel/alert-lead-time + /v1/sentinel/alert-lead-time/{cc} + /v1/sentinel/alert-lead-time/info. Rebuilt daily 04:50 UTC. Paired with /v1/sentinel/accuracy — accuracy grades every forecast, this grades every alert that actually fired. Implementation: scripts/build-alert-lead-time-retrospective.py + patch-alert-lead-time-endpoint.py.

#sentinel#forecast#early-warning#accountability#lead-time#false-alarm-rate#retrospective#ml-honesty#transparency#api

Raw data