2026-05-21

Classifier v3.4: regime-cluster fine-tuning didn't fix the tail. Held.

Tried per-regime-cluster fine-tuning heads (MENA, post-Soviet, East Asia, SE Asia, LATAM, Sub-Sahara) stacked on top of v3.3 to recover the 16 countries that regressed under v3.3. The stacking head learns to mostly ignore the cluster heads (base coef 9.8 vs cluster coefs in [-0.83, +0.64]). LOCO median F1 drops 0.870 to 0.833, only 1 of 16 regression countries improves by ≥3pp (UZ +7pp), and 2 countries regress further (GE -9pp, SY -5pp). Both promotion gates fail. v3.3 stays in production. Documented as a real negative result.

#methodology#ml#classifier#regime-cluster#fine-tuning#negative-result#honest-no-promote

Raw data

Live: classifier metadata (still v3.3)
v3.4 training script
v3.3 finding (the predecessor that v3.4 tried to improve)
Model registry