v3.8 cross-model meta-ensemble — fused 10 base classifiers, +8.4pp stratified F1 (PASS) but LOCO flat (FAIL), 7th honest negative
Calibrated Bayesian fusion: LogisticRegression + Isotonic over 10 base classifier outputs (v3.3 GBM, DBSCAN anomaly, Bayes corroboration, per-measurement XGB, per-method http+tls, per-category NEWS/ANON/GRP/COMT, STL seasonal z). Stratified 5-fold F1 = 0.8279 vs v3.3 baseline 0.7435 (+8.44pp, PASS +2pp gate). LOCO median F1 = 0.8750 vs gate 0.88 (FAIL by 0.5pp). Coefficient ranking: p_classifier dominates (β=+1.56), p_measurement gets negative weight (β=−0.98, redundancy flip), p_method_tls (β=+0.81), p_cat_GRP (β=+0.65) carry genuine new signal; STL adds essentially nothing (β=−0.01). Not promoted because cross-country generalization didnt clear the gate — v3.3 stays default at /v1/classifier/score/{cc}; v3.8 is additive at /v1/classifier/meta-ensemble/{cc} for transparency. Seventh honest negative result in the Atlas series — documents that stacking 10 weakly-diverse base models helps in-sample but does NOT meaningfully improve cross-country LOCO.
Raw data
- Live: meta-ensemble info + coefficients + gates
- Live: per-country super-score (Iran)
- Live: per-country super-score (Russia)
- Predecessor: v3.7 stacking (4 models, +1.1pp)
- Default classifier: v3.3 GBM
- Base: DBSCAN anomaly v1
- Base: Bayes corroboration v1
- Base: per-measurement classifier v1
- Base: per-method classifiers (http, tls)
- Base: per-category classifiers (NEWS, ANON, GRP, COMT)
- Wolpert 1992 — Stacked Generalization
- Zadrozny & Elkan 2002 — Isotonic Calibration