voidly

Cross-country contagion features: wins on the tail, regresses on EG. Held back.

We added 3 neighbor-risk features to v3.1 and retrained as v3.2. Stratified F1 jumped 0.673 → 0.712 and the targeted weak countries (PK, TH, SG) improved 3-9 points. But EG regressed 13.5 points and Western Europe got worse too. Net LOCO neutral. Holding v3.2 back; iterating the adjacency map.

#methodology#ml#classifier#contagion#experiment#honest-failure

v3.1 fixed the data-scarcity problem (314 → 4,237 samples). But the per-country breakdown showed a long tail of weak performers — notably Pakistan (F1 0.39), Thailand (0.10), Singapore (0.07). Hypothesis: these countries had no neighbor-aware signal, but regional contagion is a real driver (Iran censorship spilling into Iraq + Pakistan, China affecting neighbors, etc).

We added 3 cross-country features:

  • neighbor_block_rate_7d — mean anomaly_rate of regional neighbors over previous 7 days
  • neighbor_incident_count_7d — total incidents in neighbor countries
  • neighbor_max_anomaly_7d — max anomaly_rate among neighbors

Adjacency built from UN M49 subregions + hand-added cross-region land-border links (IR↔PK, EG↔SD, etc.). Covered all 131 countries. Features fire on ~76% of samples (24% have no neighbors with data).

What worked

Targeted tails ALL improved:

  • Pakistan (PK): 0.385 → 0.475 (+9.0 pts)
  • Singapore (SG): 0.074 → 0.167 (+9.3 pts)
  • Thailand (TH): 0.095 → 0.125 (+3.0 pts)

And key censorship cases held or improved:

  • Iran (IR): 0.795 → 0.846 (+5.1 pts)
  • Venezuela (VE): 0.818 → 0.852 (+3.4 pts)
  • Azerbaijan (AZ): +0.397 pts (biggest single gain)
  • Yemen (YE): +0.300 pts
  • China (CN): +0.127 pts

Stratified 5-fold metrics jumped F1 0.673 → 0.712 (+5.8%), AUC 0.868 → 0.892 (+2.8%). The contagion features account for 20.1% of total feature importance in v3.2.

What broke

Egypt (EG) regressed sharply: 0.548 → 0.413 (−13.5 pts). The UN-subregion adjacency paired EG with IL/PS/LY/SD — countries with wildly different regime types and censorship patterns. The "shared region = shared censorship" assumption broke down.

Western European democracies (ES, IT, DE) also regressed. Geographically-isolated countries (JP — only 2 neighbors) got noise. Net: 28 countries improved, 24 regressed, LOCO median F1 0.818 → 0.800 (slight loss).

The honest call

Mixed bag. Promoting v3.2 would help the high-priority tail (PK, TH, SG, IR, VE) but hurt EG (genuinely censoring) and various democracies. Net LOCO is a wash.

Two cheap iterations to try before promotion:

  1. Regime-similarity weighting: instead of equal-weight neighbors, weight each by historical censorship correlation with the source country. Stops democracies from being dragged by noisy authoritarian-neighbor signal.
  2. Drop the feature for low-cohort countries: countries with <3 mapped neighbors (JP, GA, ME, AF) likely just get noise. Mask their contagion features to 0.

v3.2 remains in /opt/voidly-ai/models/experimental/. Production model is still v3.1. The contagion signal is real (20% importance, targeted wins land), but the cohort definition needs work. Iteration v3.3 will land soon.

Reproducibility

Build script: scripts/build-classifier-v3.2-contagion.py. Train script: scripts/train-classifier-v3.2.py. Both in the public repo. The UN-subregion adjacency map is at the top of the build script — anyone can audit our country-grouping calls.

Raw data