voidly

Classifier v3: removed the 85% leakage feature, got 0.86 LOCO F1

The v2 classifier hit 99.8% F1 but country_risk_tier (a hardcoded label leakage) carried 85% of that signal. v3 drops it. Honest leave-country-out F1: 0.86 (Iran AUC 0.95).

#methodology#ml#classifier#transparency#no-leakage

Our v2 censorship classifier was reporting 99.8% F1 on stratified 5-fold cross-validation. The deep audit found why: 85% of the model's signal came from one feature, country_risk_tier, which was hardcoded based on knowing which countries censor — pure label leakage. If China is labeled "tier 4" because it has incidents, then "tier 4 ⇒ predict incident" is just memorizing the label.

The fix: v3 drops the leaky feature

Trained 2026-05-21. Dropped country_risk_tier entirely. Added three new interaction features:

  • rate_count_interaction: anomaly_rate × measurement_count
  • rate_spike_interaction: anomaly_rate × spike_magnitude
  • high_evidence: 1 if measurement_count ≥ 50 else 0

New feature importance — no single dominant feature

Featurev3 importance
rate_count_interaction40.6%
measurement_count21.6%
rate_spike_interaction11.2%
spike_magnitude10.5%
anomaly_rate6.3%

Top-3 sum: 73% (was 85% from a single feature in v2). Top-5: 90%. Healthy distribution — no single feature is doing all the work.

Honest evaluation: leave-country-out

Stratified 5-fold gives AUC 0.904 ± 0.092, F1 0.464 ± 0.153. Lower than v2's inflated 1.0/0.998 — and that's the point. Now the numbers are honest.

The real test is leave-country-out (LOCO): for each country with labeled positives, train on the other countries' data, test on the held-out country. The model never sees the held-out country during training, so there's no label leakage. LOCO median F1 was 0.857.

  • Iran (20 samples, 4 positives): AUC 0.953, F1 0.857
  • Russia (8 samples, 3 positives): F1 0.857
  • India (9 samples, 1 positive): AUC 0.875, F1 0.000
  • Myanmar (2 samples): F1 1.000
  • Tanzania (2 samples): F1 1.000
  • China (7 samples — all positive, no AUC computable): F1 0.250
  • Kazakhstan (3 samples, single positive): AUC 0.500, F1 0.000

What v3 is good for

Strong: Iran-class events (high-volume, high- anomaly-rate, well-evidenced). v3 finds these reliably without needing the country-tier crutch.

Weak: Sparse-evidence countries with 1-2 labeled positives. The model has trouble generalizing to countries it barely saw.

Honest deployment recommendation: use v3 to gate candidate incidents for review, not to auto-publish. The 0.86 LOCO F1 is a good signal for "this anomaly probably matters" but not good enough for "publish this as confirmed without human review" in low-positive-data countries.

Status

v3 model artifact lives at /opt/voidly-ai/models/censorship_classifier_v3_promoted.pkl on Vultr. The training script is in the public repo at scripts/train-classifier-v3.py. Metrics JSON at scripts/classifier-v3-metrics.json.

Next step: wire v3 into create-voidly-incidents.py so incident creation actually USES the classifier instead of rule-based heuristics. That's a separate ship.

Raw data