Shutdown onset is not predictable 7 days out — a definitive honest negative
Voidly's production 7-day forecast measures "is this country currently censored," not "will a new shutdown start" — its sliding-window target is 98.9% autocorrelated, so it scores AUC ~0.95 by reproducing a near-constant label but ~0.33 (below chance) on the rows where a shutdown actually begins. This finding builds a dedicated shutdown-ONSET predictor on a clean onset label and evaluates it honestly. The label: for each eligible country-day with no shutdown active now (active = a confirmed censorship/mixed incident over [first_seen−3d, last_seen+14d]), onset_7d=1 iff a NEW incident's first_seen lands in (t, t+7]; IODA disruption incidents (fiber cuts, DDoS, BGP leaks) are excluded as non-censorship. Over 40 countries × 2 years: 21,462 eligible rows, 297 onset events, a 1.38% base rate. Under a strict forward-temporal split (train past, test strictly-future tail, never shuffled), four model families — XGBoost, gradient boosting, and a scaled balanced logistic regression — all land at AUC 0.45–0.49 on strictly-future onset events; a 3-fold expanding walk-forward averages 0.495 (folds 0.529 / 0.440 / 0.517 — every fold at chance, which is what makes the negative defensible: a tuning artifact would spike one fold). The best model never reaches 25% precision. Verdict: shutdown onset is NOT predictable 7 days out from the features Voidly has — censorship momentum, acceleration, volatility, election proximity, protest/unrest signals, and cross-country contagion do not separate the country-days that precede a new shutdown from those that don't. This is plausible: a new shutdown is a political decision whose precursors live in situation rooms, not network telemetry, and 297 events across 40 countries is too little to learn a country-specific onset model. Nothing is promoted — the onset model is exposed as a transparency artifact at GET /v1/forecast/{cc}/onset + /v1/forecast/onset/info with model_promoted:false, the forward-temporal AUC, and honest caveats inline in every response. (An earlier run hit the same verdict for the WRONG reason — an unscaled logistic regression that never converged; the fix was a standard-scaling Pipeline, after which lbfgs converges in 53 iterations and agrees with both scale-free tree models, all three at ~0.45–0.49.) Reproduce with scripts/build-onset-features.py + scripts/train-onset-model.py.