IR CN RU BY MM VN KP CU SY SA VE EG TR PK BD TH ID MY KZ UZ TM2026-05-22

The 7-day shutdown forecast does not beat persistence — an honest re-evaluation

Voidly’s production 7-day shutdown forecast (forecast v1, XGBoost + isotonic) reports ROC AUC 0.954. A weakness audit found that number is not real: it comes from a shuffled train_test_split in scripts/train-forecast.py, and the shuffle scatters rows of the same country across train and test folds. The target is time-autocorrelated, so the leakage hands the model a near-free score. This finding is the honest re-evaluation plus a genuine attempt to fix it. forecast v2 momentum keeps all 39 v1 features and adds 30 forward/change-oriented features across six families — momentum (block-rate deltas, rising/falling run length), acceleration (2nd difference), volatility (rolling std, coefficient of variation), event anticipation (days-until-election, days-since-incident, incident-anniversary proximity), cross-country leading indicators (corr-weighted neighbor momentum), and contagion-chain score. Everything is evaluated on a forward-temporal split ONLY: train up to a cutoff, test on the strictly-future 60-day window (1,260 rows, 198 positive), never shuffled. The comparison baseline is persistence (predict tomorrow = today’s label). The honest numbers: under the forward-temporal split, v1 scores AUC 0.589 / F1 0.132, and v2 scores AUC 0.685 / F1 0.404. v2 beats v1 by +9.6pp AUC — the new features genuinely help relative to v1 — but v2 still loses to the persistence baseline by −27.2pp AUC and −51.9pp F1. The promote gate required v2 to beat persistence by ≥ 8pp F1; it missed by ~60pp. v2 is NOT promoted and v1 stays in production unchanged. Why does persistence score 0.92? Because target_7day is a sliding 7-day window: adjacent days share 6 of 7 lookahead days, so the label is 98.9% autocorrelated day-to-day — only 172 transitions across 15,330 country-day rows. Predict-yesterday wins by construction, not by forecasting. The same autocorrelation is exactly what the shuffled split leaks into v1’s 0.954. The deepest honest cut: restricted to the 31 transition rows where the label actually moves, v2’s AUC is 0.328 — below a coin flip. On the days that matter (shutdown onset, block lift) the forecast has no skill, arguably negative skill. The honest conclusion: with the data Voidly currently has, the 7-day censorship target is persistence-dominated — what is blocked stays blocked, and the rare transitions are not anticipated by momentum, calendar proximity, or neighbor contagion. The production forecast’s value is calibration + explanation (SHAP drivers, conformal intervals), not predictive lift. The leaky 0.954 was a real bug; replacing it with an honest ~0.59 and publishing the no-promote is the fix.

#methodology#ml#forecast#honest-negative#data-leakage#temporal-cv#persistence-baseline#accountability#atlas#api

Raw data