Multi-source Bayesian corroboration

One probability per country-day: given what OONI, IODA, CensoredPlanet, and Voidly probes observed, what is the chance this is real censorship? Naive-Bayes fusion with empirical likelihoods. Resolves journalist's question: “is this just one source's false positive?”

Trained 2026-05-21 · 183 confirmed censorship incidents · Raw JSON · Model info

AUC (30d test)

0.916

Brier score

0.0253

ECE

0.0211

Promoted

Per-source likelihoods

How often each source “fires” on labeled censorship days vs background days. LR present is the likelihood ratio when the source signals; values above 1 push the posterior up, below 1 push it down.

Source	P(present \| censorship)	P(present \| not)	LR present	Δ AUC if removed
OONI	54.1%	30.7%	1.76	+1.1pp
IODA	14.8%	51.3%	0.29	-0.4pp
CensoredPlanet	99.2%	18.8%	5.29	+23.0pp
Voidly probes	0.8%	0.0%	27.34	-0.0pp

Top country-days, last 30 days (posterior > 0.2)

#	Country	Date	Posterior	Sources corroborating
1	United Arab Emirates (AE)	2026-05-21	36.8%	2/4
2	United Arab Emirates (AE)	2026-05-20	36.8%	2/4
3	United Arab Emirates (AE)	2026-05-14	36.8%	2/4
4	United Arab Emirates (AE)	2026-05-11	36.8%	2/4
5	United Arab Emirates (AE)	2026-05-10	36.8%	2/4
6	United Arab Emirates (AE)	2026-05-08	36.8%	2/4
7	United Arab Emirates (AE)	2026-05-07	36.8%	2/4
8	United Arab Emirates (AE)	2026-05-06	36.8%	2/4
9	United Arab Emirates (AE)	2026-05-05	36.8%	2/4
10	United Arab Emirates (AE)	2026-05-04	36.8%	2/4
11	United Arab Emirates (AE)	2026-04-30	36.8%	2/4
12	United Arab Emirates (AE)	2026-04-29	36.8%	2/4
13	United Arab Emirates (AE)	2026-04-28	36.8%	2/4
14	United Arab Emirates (AE)	2026-04-27	36.8%	2/4
15	United Arab Emirates (AE)	2026-04-22	36.8%	2/4
16	Azerbaijan (AZ)	2026-05-21	36.8%	2/4
17	Azerbaijan (AZ)	2026-05-20	36.8%	2/4
18	Azerbaijan (AZ)	2026-05-18	36.8%	2/4
19	Azerbaijan (AZ)	2026-05-14	36.8%	2/4
20	Azerbaijan (AZ)	2026-05-11	36.8%	2/4

Scanned 2,473 country-days, 191 above threshold.

Methodology

Each country-day is one observation. Per source s, we compute the presence indicator: did s emit any elevated/warning/critical-level signal on that day? We then estimate two likelihoods on the training window:

P(s present | C=1): how often s fires on labeled censorship days
P(s present | C=0): how often s fires on background days

We use Laplace smoothing (α=1) on both branches so no source produces a zero or infinite likelihood. The posterior is computed in log-odds space for numerical stability:

log_odds(C=1) = log(prior/(1-prior))
              + Σ_s log(LR(s = observed))

LR(s=present) = P(s present | C=1) / P(s present | C=0)
LR(s=absent)  = P(s absent  | C=1) / P(s absent  | C=0)

posterior = sigmoid(log_odds)

Training window: 2026-02-20 to 2026-04-21. Held-out test: last 30 days (63 positives, 2,485 rows total).

Honest caveats

Naive-Bayes independence is violated. OONI and CensoredPlanet both probe DNS resolvers — when one fires, the other often does too. This inflates the joint likelihood (we'd need a Bayesian network or copula model to fix it). We chose interpretability over correctness here.
Posterior caps around 0.37 because the empirical prior is only 3.5%. Even with all four sources firing, the model never crosses 0.5 — that's the math, not a bug. Treat posterior ≥ 0.2 as “multi-source corroborated” in this regime.
CensoredPlanet dominates AUC (+23pp). The other three sources add modest signal. Without CensoredPlanet the AUC drops to 0.69 — useful, but not a near-perfect detector.
IODA actively subtracts (−0.4pp AUC). IODA fires more often on non-censorship days than on labeled-censorship days, because our labels exclude IODA-only disruptions (those are connectivity events, not confirmed censorship). Including IODA in the fusion is a deliberate honesty signal — we report what every source observed, even when it pushes against the label.
Voidly probes contribute near-zero (Δ AUC ≈ 0). Most Voidly probes run from open-net countries (US, UK, NL, DE, FR) for safety, so they rarely fire on the same country-days that produce confirmed-censorship labels from heavily censored countries. Coverage growth in censored regions would flip this.

GET /v1/classifier/corroborate

Top-N leaderboard JSON

GET /v1/classifier/corroborate/info

Model metadata + likelihoods

Unsupervised anomaly

CenDTect-style DBSCAN — complementary lens

Generated: 2026-05-22T19:10:26.763633Z