Our v2 censorship classifier was reporting 99.8% F1 on stratified
5-fold cross-validation. The deep audit found why: 85% of
the model's signal came from one feature,
country_risk_tier, which was hardcoded based on
knowing which countries censor — pure label leakage. If China is
labeled "tier 4" because it has incidents, then "tier 4 ⇒ predict
incident" is just memorizing the label.
The fix: v3 drops the leaky feature
Trained 2026-05-21. Dropped country_risk_tier
entirely. Added three new interaction features:
rate_count_interaction: anomaly_rate × measurement_countrate_spike_interaction: anomaly_rate × spike_magnitudehigh_evidence: 1 if measurement_count ≥ 50 else 0
New feature importance — no single dominant feature
| Feature | v3 importance |
|---|---|
| rate_count_interaction | 40.6% |
| measurement_count | 21.6% |
| rate_spike_interaction | 11.2% |
| spike_magnitude | 10.5% |
| anomaly_rate | 6.3% |
Top-3 sum: 73% (was 85% from a single feature in v2). Top-5: 90%. Healthy distribution — no single feature is doing all the work.
Honest evaluation: leave-country-out
Stratified 5-fold gives AUC 0.904 ± 0.092, F1 0.464 ± 0.153. Lower than v2's inflated 1.0/0.998 — and that's the point. Now the numbers are honest.
The real test is leave-country-out (LOCO): for each country with labeled positives, train on the other countries' data, test on the held-out country. The model never sees the held-out country during training, so there's no label leakage. LOCO median F1 was 0.857.
- Iran (20 samples, 4 positives): AUC 0.953, F1 0.857 ✓
- Russia (8 samples, 3 positives): F1 0.857 ✓
- India (9 samples, 1 positive): AUC 0.875, F1 0.000
- Myanmar (2 samples): F1 1.000
- Tanzania (2 samples): F1 1.000
- China (7 samples — all positive, no AUC computable): F1 0.250
- Kazakhstan (3 samples, single positive): AUC 0.500, F1 0.000
What v3 is good for
Strong: Iran-class events (high-volume, high- anomaly-rate, well-evidenced). v3 finds these reliably without needing the country-tier crutch.
Weak: Sparse-evidence countries with 1-2 labeled positives. The model has trouble generalizing to countries it barely saw.
Honest deployment recommendation: use v3 to gate candidate incidents for review, not to auto-publish. The 0.86 LOCO F1 is a good signal for "this anomaly probably matters" but not good enough for "publish this as confirmed without human review" in low-positive-data countries.
Status
v3 model artifact lives at
/opt/voidly-ai/models/censorship_classifier_v3_promoted.pkl
on Vultr. The training script is in the public repo at
scripts/train-classifier-v3.py. Metrics JSON at
scripts/classifier-v3-metrics.json.
Next step: wire v3 into create-voidly-incidents.py
so incident creation actually USES the classifier instead of
rule-based heuristics. That's a separate ship.