voidly

Multi-source Bayesian corroboration

One probability per country-day: given what OONI, IODA, CensoredPlanet, and Voidly probes observed, what is the chance this is real censorship? Naive-Bayes fusion with empirical likelihoods. Resolves journalist's question: “is this just one source's false positive?”

Trained 2026-05-21 · 183 confirmed censorship incidents · Raw JSON · Model info

AUC (30d test)
0.916
Brier score
0.0253
ECE
0.0211
Promoted
NO

Per-source likelihoods

How often each source “fires” on labeled censorship days vs background days. LR present is the likelihood ratio when the source signals; values above 1 push the posterior up, below 1 push it down.

SourceP(present | censorship)P(present | not)LR presentΔ AUC if removed
OONI54.1%30.7%1.76+1.1pp
IODA14.8%51.3%0.29-0.4pp
CensoredPlanet99.2%18.8%5.29+23.0pp
Voidly probes0.8%0.0%27.34-0.0pp

Top country-days, last 30 days (posterior > 0.2)

#CountryDatePosteriorSources corroborating
1United Arab Emirates (AE)2026-05-2136.8%
2/4
2United Arab Emirates (AE)2026-05-2036.8%
2/4
3United Arab Emirates (AE)2026-05-1436.8%
2/4
4United Arab Emirates (AE)2026-05-1136.8%
2/4
5United Arab Emirates (AE)2026-05-1036.8%
2/4
6United Arab Emirates (AE)2026-05-0836.8%
2/4
7United Arab Emirates (AE)2026-05-0736.8%
2/4
8United Arab Emirates (AE)2026-05-0636.8%
2/4
9United Arab Emirates (AE)2026-05-0536.8%
2/4
10United Arab Emirates (AE)2026-05-0436.8%
2/4
11United Arab Emirates (AE)2026-04-3036.8%
2/4
12United Arab Emirates (AE)2026-04-2936.8%
2/4
13United Arab Emirates (AE)2026-04-2836.8%
2/4
14United Arab Emirates (AE)2026-04-2736.8%
2/4
15United Arab Emirates (AE)2026-04-2236.8%
2/4
16Azerbaijan (AZ)2026-05-2136.8%
2/4
17Azerbaijan (AZ)2026-05-2036.8%
2/4
18Azerbaijan (AZ)2026-05-1836.8%
2/4
19Azerbaijan (AZ)2026-05-1436.8%
2/4
20Azerbaijan (AZ)2026-05-1136.8%
2/4

Scanned 2,473 country-days, 191 above threshold.

Methodology

Each country-day is one observation. Per source s, we compute the presence indicator: did s emit any elevated/warning/critical-level signal on that day? We then estimate two likelihoods on the training window:

  • P(s present | C=1): how often s fires on labeled censorship days
  • P(s present | C=0): how often s fires on background days

We use Laplace smoothing (α=1) on both branches so no source produces a zero or infinite likelihood. The posterior is computed in log-odds space for numerical stability:

log_odds(C=1) = log(prior/(1-prior))
              + Σ_s log(LR(s = observed))

LR(s=present) = P(s present | C=1) / P(s present | C=0)
LR(s=absent)  = P(s absent  | C=1) / P(s absent  | C=0)

posterior = sigmoid(log_odds)

Training window: 2026-02-20 to 2026-04-21. Held-out test: last 30 days (63 positives, 2,485 rows total).

Honest caveats

  • Naive-Bayes independence is violated. OONI and CensoredPlanet both probe DNS resolvers — when one fires, the other often does too. This inflates the joint likelihood (we'd need a Bayesian network or copula model to fix it). We chose interpretability over correctness here.
  • Posterior caps around 0.37 because the empirical prior is only 3.5%. Even with all four sources firing, the model never crosses 0.5 — that's the math, not a bug. Treat posterior ≥ 0.2 as “multi-source corroborated” in this regime.
  • CensoredPlanet dominates AUC (+23pp). The other three sources add modest signal. Without CensoredPlanet the AUC drops to 0.69 — useful, but not a near-perfect detector.
  • IODA actively subtracts (−0.4pp AUC). IODA fires more often on non-censorship days than on labeled-censorship days, because our labels exclude IODA-only disruptions (those are connectivity events, not confirmed censorship). Including IODA in the fusion is a deliberate honesty signal — we report what every source observed, even when it pushes against the label.
  • Voidly probes contribute near-zero (Δ AUC ≈ 0). Most Voidly probes run from open-net countries (US, UK, NL, DE, FR) for safety, so they rarely fire on the same country-days that produce confirmed-censorship labels from heavily censored countries. Coverage growth in censored regions would flip this.
GET /v1/classifier/corroborate
Top-N leaderboard JSON
GET /v1/classifier/corroborate/info
Model metadata + likelihoods
Unsupervised anomaly
CenDTect-style DBSCAN — complementary lens

Generated: 2026-05-22T19:10:26.763633Z