2026-05-22

The multi-source Bayesian corroboration classifier rides circular features — an honest audit

Voidly's multi-source Bayesian corroboration classifier (corroboration_v1) was reported at ROC AUC 0.92. It fuses four sensor networks — OONI, IODA, CensoredPlanet and the Voidly probe network — into one naive-Bayes posterior, and it feeds the auto-incident-watchdog as the "does an independent source agree?" gate. A platform-wide audit this week caught several Voidly models inflating metrics via shuffled train/test splits that leak temporal autocorrelation; the corroboration model had never been individually audited. This finding is that audit. The split is honest: scripts/train-bayesian-corroboration.py uses a forward-temporal split (train oldest 60d, test newest 30d, never shuffled). We reproduced its number exactly — temporal test AUC 0.9157 — and a leave-country-out CV also held at median AUC 0.866. So the leakage is NOT autocorrelated rows bleeding across folds. The leakage is circular features. The label is_censorship is taken from the incidents table, and an incident in Voidly is minted FROM anomalous evidence: a join through incident_evidence shows 343 of 344 confirmed censorship/mixed incidents have a linked elevated/warning/critical evidence row on the exact same country-day as first_seen. The model's features ("did source X emit an anomalous evidence row on this country-day") are computed from those same rows — the feature partially IS the label. 265 of 343 censorship incidents are sourced from CensoredPlanet alone, so censoredplanet_present — a single raw binary feature, no model — scores AUC 0.8997, almost matching the full 4-source model. The Bayesian fusion adds only +1.6pp AUC; a 2,000-sample bootstrap puts the 95% CI on that lift at [0.0pp, 3.3pp], all but touching zero. At threshold 0.5 the model's F1/precision/recall are all 0.0 — the posterior never crosses 0.5 even on true positives, so it is a ranker of "did CensoredPlanet flag this day", not a classifier; AUC is the one metric the circular feature inflates. On a leakage-free target — predict NEXT-day censorship from TODAY's source presence, forward-temporal split — AUC falls to 0.7348, and even that is generous because a CensoredPlanet block today often recurs tomorrow and mints another CP-sourced incident. Verdict: honest negative. The headline 0.92 is real arithmetic but near-tautological and is not a measure of censorship-detection skill. There is no model change to promote — any "improvement" measured against a circular label would be just as fake as the 0.92. The fix is disclosure plus a pipeline change: the metrics JSON now carries a full leakage_audit block and seven rewritten honest_caveats with promoted=false (served live at /v1/classifier/corroborate/info), the model registry gains a corroboration-v1-bayesian-reeval entry, and the auto-incident-watchdog docstring is corrected — its corroboration gate is a conservative near-veto, not the independent confirmation it was presented as. Two correlated signals derived from the same evidence are not corroboration. Real corroboration would require source-held-out labels, a genuine time gap, or independent editorial ground truth.

#methodology#ml#corroboration#bayesian#honest-negative#data-leakage#circular-features#temporal-cv#accountability#atlas#api

Raw data