2026-05-21

v3.8 cross-model meta-ensemble — fused 10 base classifiers, +8.4pp stratified F1 (PASS) but LOCO flat (FAIL), 7th honest negative

Calibrated Bayesian fusion: LogisticRegression + Isotonic over 10 base classifier outputs (v3.3 GBM, DBSCAN anomaly, Bayes corroboration, per-measurement XGB, per-method http+tls, per-category NEWS/ANON/GRP/COMT, STL seasonal z). Stratified 5-fold F1 = 0.8279 vs v3.3 baseline 0.7435 (+8.44pp, PASS +2pp gate). LOCO median F1 = 0.8750 vs gate 0.88 (FAIL by 0.5pp). Coefficient ranking: p_classifier dominates (β=+1.56), p_measurement gets negative weight (β=−0.98, redundancy flip), p_method_tls (β=+0.81), p_cat_GRP (β=+0.65) carry genuine new signal; STL adds essentially nothing (β=−0.01). Not promoted because cross-country generalization didnt clear the gate — v3.3 stays default at /v1/classifier/score/{cc}; v3.8 is additive at /v1/classifier/meta-ensemble/{cc} for transparency. Seventh honest negative result in the Atlas series — documents that stacking 10 weakly-diverse base models helps in-sample but does NOT meaningfully improve cross-country LOCO.

#ml#classifier#meta-ensemble#stacking#logistic-regression#isotonic-calibration#negative-result#honest#transparency#shipped-not-promoted

Raw data