After the 2026-05-20 v3 leakage fix shipped, the classifier had honest accuracy numbers but a serious sample-scarcity ceiling: 314 hand-labeled samples, only 18 positive, only 7 countries with ground truth. The LOCO median F1 of 0.86 looked great but was computed across a handful of countries — not the generalization story it appeared to be.
On 2026-05-21 we expanded the training data by mining the live incidents table (2,636 citable incidents) joined to evidence and probe_metrics aggregates. Each (country, day) pair where we have ≥5 evidence measurements becomes one training sample. Positive label if there's an incident in that country on that day; negative if not.
The numbers
| v3 | v3.1 | Δ | |
|---|---|---|---|
| Total samples | 314 | 4,237 | +13.5× |
| Positive samples | 18 | 1,116 | +62× |
| Unique countries | 7 | 131 | +18.7× |
| Stratified F1 | 0.464 | 0.673 | +45% |
| Stratified AUC | 0.904 | 0.868 | −4% |
| LOCO mean F1 | 0.566 | 0.710 | +25% |
| LOCO median F1 | 0.857 | 0.818 | −5% |
| LOCO countries evaluated | 7 | 127 | +18× |
Why the LOCO median dropped slightly
v3's LOCO median was computed over 7 countries — mostly easy cases where evidence patterns were stark. v3.1's median is over 127 countries, including tail cases like Pakistan (F1 0.39, AUC 0.64) and Thailand (F1 0.10, AUC 0.54) where incident patterns are harder to learn. The model is genuinely better — the comparison floor just moved.
Where v3.1 wins clearly
- Iran (IR): F1 0.86 → 0.86, AUC 0.95 → 0.95 (held)
- Venezuela (VE): F1 0.82, AUC 0.92 (new — wasn't in v3 eval)
- Egypt (EG): F1 0.80, AUC 0.94 (new)
- Russia (RU): F1 0.63, AUC 0.79 (new)
- Ukraine (UA): F1 0.96, AUC 0.99 (new)
- Nigeria (NG): F1 0.90, AUC 0.97 (new)
- Philippines (PH): F1 0.83, AUC 0.95 (new)
Where v3.1 is still weak (honest)
- Thailand (TH): F1 0.10, AUC 0.54 — incident patterns don't cluster well
- Singapore (SG): F1 0.07, AUC 0.36 — minimal censorship, model overfits to noise
- Pakistan (PK): F1 0.39, AUC 0.64 — political volatility makes patterns inconsistent
These tail cases are the new frontier. Adding cross-country contagion features (planned next) should help PK by exploiting regional signal from Iran/India.
Feature importance — well-distributed
Top 3 features sum to 71.9% (vs v3's 73.5%):
anomaly_rate— 28.4%month— 23.3% (seasonal pattern, election cycles)measurement_count— 20.3%rate_count_interaction— 8.5%spike_magnitude— 7.2%
No leakage. No single dominator. The signal is honest.
What changed in the pipeline
scripts/build-classifier-v3.1-expanded-dataset.py— mines incidents + evidence + probe_metrics, outputslabeled_incidents_v3.1.jsonscripts/train-classifier-v3.1.py— same model architecture (GradientBoosting), trained on the expanded JSON- Promoted to
/opt/voidly-ai/models/censorship_classifier_v3_promoted.pkl /v1/classifier/infoand/v1/classifier/feature-importanceautomatically picked it up — no code changes needed- Backup at
...promoted.pkl.bak.v3.0-2026-05-21for instant rollback
Next
v3.2 will add cross-country contagion features (neighbor risk
+ regional aggregates) to help the weak-tail countries. v4 will
add probabilistic calibration via CalibratedClassifierCV
for trustworthy confidence numbers downstream.