2026-05-29

The censorship classifier generalizes across countries but degrades across time — a forward-temporal audit

The production censorship classifier v3.3 (GradientBoosting, 16 country-day features) reports stratified 5-fold F1 0.729 / AUC 0.899 and leave-one-country-out (LOCO) median F1 0.870. We reproduced the stratified number exactly (AUC 0.895 / F1 0.725 with a fresh GB on the same features), then re-split the 4,237 samples by TIME — train on the oldest 70% of distinct days, test on the newest 30% (strict past→future). Forward-temporal skill drops materially: AUC 0.669 (−0.226), F1 0.474, precision 0.34 / recall 0.80 at threshold 0.5, PR-AUC 0.52 at a 0.27 base rate (1.9x lift). So the classifier is NOT broken — forward in time it still beats chance ~2x and recovers 80% of incidents — but the random-split AUC overstates forward-deployment accuracy. The three splits answer different questions: stratified 5-fold (shuffled rows, near-duplicates in both folds) is the easiest; LOCO (hold out whole countries) is a genuinely hard cross-country test that v3.3 passes well and remains the honest headline for country generalization; forward-temporal (hold out the future) is the deployment question, and there v3.3 degrades because what an incident looks like drifts over time. Honest one-liner: v3.3 generalizes across COUNTRIES but degrades across TIME — which is exactly why it is retrained weekly (the cadence is load-bearing, not hygiene). Milder than the same-week shutdown-risk forward audit (whose within-country 7-day forecast fell BELOW chance at ~0.36) because this is same-day detection, not forecasting. No model change: v3.3 stays live and its LOCO F1 0.87 is real. What changed is disclosure — the live /v1/classifier/info evaluation now carries the forward-temporal block + a note that the random-split number is not forward-deployment accuracy. Reproduce with scripts/audit-classifier-v3.3-temporal.py.

#ml#classifier#temporal-cv#honest-scoping#generalization#leave-one-country-out#accountability#atlas#api

Raw data