2026-05-22

More labels can't fix the tail — a 155-positive data experiment that failed the gate honestly

Classifier v3.3 has a split personality: leave-country-out (LOCO) median F1 0.870 but mean only 0.711, dragged down by ~16 MENA/former-Soviet countries scoring LOCO F1 between 0.00 and 0.36. A prior finding (empirical-Bayes partial-pooling) pinned the cause — those countries are not row-poor but POSITIVE-poor (1–33 confirmed-censorship days apiece) — and prescribed targeted data labeling, not modeling. This finding executes that prescription and measures it honestly. We mined 155 new high-confidence positive labels for the tail under a strict, documented evidence bar (anomaly_rate ≥ 0.60 plus one of: ≥2 independent sources, OONI-critical with a spike guard, or an incident-backed ±3-day window; IODA disruption never counted; no 1→0 flips; 4 countries — TN/YE/AM/GE — yielded ZERO positives because defensible corroborating evidence genuinely does not exist), grew the corpus 1,116→1,271 positives, and retrained a v3.4 candidate with identical architecture/hyperparameters under the same LOCO protocol (the v3.3 baseline re-run reproduced to six decimals: LOCO mean 0.710936, median 0.869565). It FAILED the promote gate decisively: LOCO mean 0.7109→0.6907 (−2.0pp, needed +1.0), median 0.870→0.800 (−7.0pp, needed no regression); only the third gate (more tail countries usable, 4→6) passed. The mechanism is a trap for "more correct labels is always better": the new positives all sit in the high-anomaly-rate region, so under balanced class weights they slide the global decision boundary down — and because LOCO hides country identity, the model cannot tell "UZ at anomaly_rate 0.7" (real censorship) from "Spain at 0.7" (measurement noise) in the 16 shared features. Clean head-country days then cross the boundary: on IR true-negative days mean predicted probability rose 0.260→0.365 (6→16 false positives), IN 0.210→0.317 (3→14 FP); perfect-LOCO-F1 countries fell 46→41. Eight tail countries improved (JO +0.31, LY +0.22, AZ +0.12, OM +0.11) but the typical head country got slightly worse to pay for it. The deeper lesson, sharper than "label more": the tail's censorship signal is NOT feature-separable from head-country negatives in the 16-feature LOCO space — a single global model with a single global threshold can only trade head accuracy for tail accuracy. v3.4 is NOT promoted; v3.3 stays in production unchanged. The honestly-labeled corpus is kept for the next attempt, which should be structurally different (per-country/region thresholds, or a tail-specialised model scored separately). This is the fourth Atlas experiment to hit the same wall from a new direction; naming it precisely — the tail is not separable under LOCO with these features — is the contribution. Reproduce with scripts/train-classifier-v3.4-tail.py.

#methodology#ml#classifier#honest-negative#leave-one-country-out#data-labeling#class-imbalance#long-tail#accountability#atlas#api

Raw data