Atlas Findings
Curated deep-dives on the major censorship events Voidly Atlas has measured. Each finding gets a permanent URL with a journalist-friendly framing, relevant incident IDs, and links to the raw upstream data.
- VE2026-05-20
Venezuela: 63 confirmed censorship incidents in 90 days
Venezuela leads the world in incident volume on the Voidly Atlas — 63 confirmed events in the last 90 days, the highest count of any country we track.
#shutdown#elections#latin-america#leading-indicator - IR2026-05-20
Iran 2026 Presidential Election: 52% peak shutdown risk
Voidly's forecast model flags a 52% peak shutdown risk for Iran in the 7-day window leading into the 2026 presidential election, citing election-day as the primary driver.
#elections#middle-east#forecast#shutdown - CNIRRUTM2026-05-20
Anti-circumvention tools are universally targeted
Our probe network detects 100% block rate on getlantern.org globally and 23%+ block rates on Signal, Telegram, WhatsApp — the same anti-circumvention toolkit blocked in every restrictive regime.
#circumvention#global#media-freedom - 2026-05-21
Classifier v3: removed the 85% leakage feature, got 0.86 LOCO F1
The v2 classifier hit 99.8% F1 but country_risk_tier (a hardcoded label leakage) carried 85% of that signal. v3 drops it. Honest leave-country-out F1: 0.86 (Iran AUC 0.95).
#methodology#ml#classifier#transparency#no-leakage - 2026-05-20
How we fixed Sentinel's 15× miscalibration in one afternoon
The forecast was telling journalists "5% risk in Iran" when the actual incident rate was 65%. We refit isotonic regression on 810 live (predicted, observed) pairs from sentinel_outcomes. Brier dropped 0.59 → 0.22; Iran's forecast jumped from 0.15 to 0.74.
#methodology#ml#forecast#calibration#transparency - 2026-05-20
How we audited our own shutdown-forecast model and published the embarrassing numbers
Voidly Sentinel publishes three accuracy splits — stratified (inflated 0.98 AUC), time-based (random 0.50), and LOCO median (honest 0.91). We cite the honest number, not the impressive one.
#methodology#ml#transparency#forecast - 2026-05-21
Six new ML transparency surfaces shipped in one session
Every Sentinel forecast now ships with SHAP contributions + a conformal interval. The v3 classifier has public feature-importance and metadata endpoints. /sentinel/backtest renders the reliability diagram, /atlas/forecast lists every watched country, and /v1/sentinel/movers surfaces 7-day deltas.
#methodology#ml#transparency#shap#classifier#forecast - 2026-05-21
Classifier v3.1: trained on 13.5× more data, evaluated on 18× more countries
v3 was the leakage fix. v3.1 is the data fix. By mining the live incidents table for per-country-day labels, the training set jumps from 314 / 18 positive / 7 countries to 4,237 / 1,116 positive / 131 countries. LOCO median F1 is now an honest 0.82 across 127 countries.
#methodology#ml#classifier#training-data#honest-metrics - 2026-05-21
Cross-country contagion features: wins on the tail, regresses on EG. Held back.
We added 3 neighbor-risk features to v3.1 and retrained as v3.2. Stratified F1 jumped 0.673 → 0.712 and the targeted weak countries (PK, TH, SG) improved 3-9 points. But EG regressed 13.5 points and Western Europe got worse too. Net LOCO neutral. Holding v3.2 back; iterating the adjacency map.
#methodology#ml#classifier#contagion#experiment#honest-failure - 2026-05-21
Causal attribution for shutdowns: synthetic DiD applied to internet censorship
When a shutdown happens, we can now answer "what caused it?" with a defensible counterfactual. /v1/sentinel/attribute builds a synthetic control from weighted stable-democracy donors, measures the post-period gap, runs a permutation p-value, and surfaces nearby political events. Method: Arkhangelsky et al. (arXiv:1812.09970), adapted from Internet Society NetLoss (ACM JCSS 2024).
#methodology#attribution#causal-inference#sdid#novel - 2026-05-21
Classifier v3.3: regime-similarity contagion. Better on aggregate, MENA trade-off.
v3.2 weighted neighbors by geography (UN subregion) and the results were mixed. v3.3 weights neighbors by historical anomaly_rate correlation — and wins clearly on aggregate. Stratified F1 0.673 → 0.729 (+8%), LOCO median F1 0.818 → 0.870 (+5%), EG recovered 0.548 → 0.726 (+18pp), Western European democracies back to F1 ~1.0. But 16 countries regress >5pp vs v3.1, mostly MENA + former Soviet states whose neighbor correlations fall below the overlap threshold and drop to 0. Promoted with honest caveats.
#methodology#ml#classifier#contagion#regime-similarity#honest-trade-off - 2026-05-21
Classifier v3.4: regime-cluster fine-tuning didn't fix the tail. Held.
Tried per-regime-cluster fine-tuning heads (MENA, post-Soviet, East Asia, SE Asia, LATAM, Sub-Sahara) stacked on top of v3.3 to recover the 16 countries that regressed under v3.3. The stacking head learns to mostly ignore the cluster heads (base coef 9.8 vs cluster coefs in [-0.83, +0.64]). LOCO median F1 drops 0.870 to 0.833, only 1 of 16 regression countries improves by ≥3pp (UZ +7pp), and 2 countries regress further (GE -9pp, SY -5pp). Both promotion gates fail. v3.3 stays in production. Documented as a real negative result.
#methodology#ml#classifier#regime-cluster#fine-tuning#negative-result#honest-no-promote - 2026-05-21
Forecast v2 contagion: huge aggregate wins, IR regresses 27pp. Held back.
Applied the classifier v3.3 regime-weighted-contagion playbook to the XGBoost forecast model. Stratified F1 +4.9pp, LOCO median F1 +17.8pp (! — bigger than classifier got), 15 of 19 countries improve. But Iran — a flagship country — regresses 27.4pp F1 because its neighbors have no positive correlation. Honest no-promote.
#methodology#ml#forecast#contagion#iran#honest-no-promote - 2026-05-21
Forecast hyperparameter grid search: defaults already near-optimal
Ran a 27-cell GridSearchCV over XGBoost (n_estimators × max_depth × learning_rate) plus a follow-up min_child_weight/gamma sweep. Holdout AUC improved +0.007. But LOCO median AUC DROPPED -0.003. Best params lose in 7 of 10 most-active countries. The current defaults are at the practical ceiling for this feature set. Future gains require feature engineering, not hyperparams.
#methodology#ml#forecast#hyperparameters#honest-no-improvement#distribution-shift - 2026-05-21
Per-ASN forecasting: not viable today. The probe network needs 5× more ASN coverage first.
We prototyped per-ASN granular forecasting (one model per ISP/AS) per Saha et al. WebSci 2025. Of 168 ASN-tagged ASs in our evidence corpus, only 6 had ≥30 measurement days — and only 1 had enough class variance to train. The data isn't there yet. Filing this as a probe-network expansion priority instead.
#methodology#ml#forecast#per-asn#data-density#honest-not-yet - 2026-05-21
Stealth blackout detector: 458 candidate days where BGP held but the data plane didn't
Aryapour 2025 (arXiv 2507.14183) showed Iran can run a "stealth blackout" — keep BGP routes UP while throttling DNS/HTTP/HTTPS. Invisible to BGP-based IODA. We built a heuristic detector: ping-slash24 critical alerts ≥ 5 AND BGP relatively stable AND OONI blocking/interference corroboration. Found 458 candidate country-days (149 strong) — all already in our incidents table but with empty mechanism fields. The detector lets us back-classify opaque "Internet disruption" incidents as stealth-blackout-flavored.
#methodology#detection#stealth-blackout#iran#ioda#aryapour - 2026-05-21
Atlas Score v2: base-rate weighting promotes chronic blockers (CN, RU, KP).
v1 of the score rewarded change over level — Russia/China/North Korea scored as B- because nothing was actively changing. v2 weights 50% structural baseline (12-month censorship-weighted incidents + tier floor) and only 20% recent forecast. Result: CN +33pts, RU +24pts, KP +32pts. Iran moves to #1 at F grade. v2 is experimental at /v1/atlas/score-v2; v1 remains the default until grade bands are tuned.
#methodology#atlas-score#base-rate#china#russia#experimental - 2026-05-21
Multi-horizon forecast shipped: 1-day, 7-day, 30-day separate models
Voidly's forecast is no longer single-horizon. We trained 3 separate XGBoost + isotonic models (1d, 7d, 30d), all clearing honest thresholds (AUC 0.91 / 0.88 / 0.84 LOCO). Each horizon has its own conformal interval + per-horizon top-5 SHAP features. The drivers differ by horizon: 1d is operational telemetry, 7d is political tension (GDELT), 30d is repeat-risk + seasonal. SoTA literature (TFT, Sun et al. spatio-temporal conformal) says multi-horizon beats single. We confirmed and shipped.
#methodology#ml#forecast#multi-horizon#shap#conformal - 2026-05-20
Forecast retrain unblocked: dual-holdout gate (legacy + temporal)
The weekly forecast retrain has rejected every new model since May 3, 2026, because the frozen 2024-style holdout no longer reflects 2026 reality. We shipped a dual-holdout gate that requires the new model to not regress on RECENT data without catastrophically regressing on legacy.
#methodology#ml#forecast#retrain#distribution-shift#holdout#gate - 2026-05-21
CenDTect-style DBSCAN unsupervised anomaly: AUC 0.6506, promoted as second-opinion signal
Adapted the CenDTect approach (Aceto & Pescape 2025 — DBSCAN over OONI feature vectors) to Voidly's 80K-row evidence table. Per-country rolling 45-day window, DBSCAN(eps=75th-pct kNN, min_samples=3) on 12 standardized features. AUC vs v3.3 labeled incidents: 0.6506, just above the 0.65 promote floor. Promoted as a SECOND-OPINION signal — the supervised classifier still wins at 0.99, but DBSCAN surfaces shape-anomalous days the labels never saw. Live at /v1/anomaly/dbscan/{cc}.
#methodology#ml#anomaly#unsupervised#dbscan#cendtect#second-opinion#promoted - 2026-05-21
Per-domain HDBSCAN drift surface: novel-blocking detection orthogonal to per-country DBSCAN
Shipped a second unsupervised anomaly axis: per-DOMAIN HDBSCAN drift over the last-28-day feature vector for every domain with >= 10 measurements. Weekly cron compares this week vs last week — new clusters = novel blocking patterns, centroid drift = existing patterns intensifying, per-domain L2 distance = how much a domain's blocking profile changed. Orthogonal to /v1/anomaly/dbscan/{cc} (per-country DBSCAN). First run: 27 domains, 2 new clusters, all top-10 drift domains corroborated by critical/warning evidence in the last 14 days. Live at /v1/anomaly/domain-drift/leaderboard and /v1/anomaly/domain-drift/{domain}.
#methodology#ml#anomaly#unsupervised#hdbscan#drift#per-domain#novel-blocking#second-opinion#promoted - 2026-05-21
Forecast labels cleaned: IODA outages no longer count as confirmed censorship
The forecast target_7day label was treating IODA outage alerts as confirmed censorship — flooding April 2026 with 1,011 disruption labels across 167 countries (94% of all April incidents). We split the labeling so only confirmed-censorship incidents drive the forecast target. April positive rate dropped from 79% to 21% and the dual-gate now accepts new models. New model promoted to production.
#methodology#ml#forecast#labels#data-quality#ioda#fix - 2026-05-21
Adaptive Conformal Inference: forecast calibration that updates itself
The forecast model now ships with Adaptive Conformal Inference (ACI) — an online update from Gibbs and Candes 2021 that keeps 90 percent intervals close to nominal under distribution shift. No retraining required, just a daily cron over observed outcomes.
#ml#forecast#calibration#conformal#aci#online-learning#transparency - 2026-05-21
Row-level measurement classifier: per-measurement censorship scoring (Niaki KDD23 inspired)
New POST /v1/measurement/classify scores a single OONI, CensoredPlanet, IODA, or Voidly measurement and returns a probability + SHAP top-5 explanation. Inspired by Niaki et al. KDD 2023. Honest framing: the model learns to reconstruct the labeling rule, so the high AUC is real but reflects label leakage from raw signals, not novel detection ability.
#ml#classifier#row-level#measurement#xgboost#shap#transparency#niaki-kdd23 - 2026-05-21
GraphSAGE over CAIDA AS-AS topology: LOOCV AUC 0.80 but n=6 is statistically thin
Built a 2-layer GraphSAGE GNN over the May 2026 CAIDA AS-relationship graph (7,060 nodes, 841K edges) to forecast per-ASN 7-day shutdown probability. Leave-one-out CV across the 6 tier-1 ASNs with enough density gives AUC = 0.80, above the 0.65 promote floor — but a permutation test on the 6 fold predictions yields p = 0.32, so we honestly cannot reject the null at any reasonable level. Shipped live at /v1/forecast/asn-gnn/{asn} with passed_promote_floor=false and honest_caveats inline. The actual bottleneck is data sparsity (only 6 ASNs have ≥30 days of evidence), not the GNN architecture.
#methodology#ml#forecast#per-asn#gnn#graphsage#caida#topology#small-n#honest-not-yet - 2026-05-21
TabPFN-v2 lost to v3.3 GradientBoosting (stratified F1 0.719 vs 0.729, LOCO 0.419 vs 0.870) — kept v3.3
We tested TabPFN-v2 (Hollmann et al. 2023, arXiv:2207.01848) as a v3.5 classifier candidate on the same 4,237-sample / 1,116-positive / 131-country / 16-feature dataset that v3.3 GradientBoosting uses. Published TabPFN benchmarks suggested +5-9pp F1 on small (less-than-10K) tabular data. On our dataset the result was the opposite: stratified 5-fold F1 0.719 +/- 0.031 (one point below v3.3 baseline 0.729), and LOCO sampled-30-largest-countries median F1 0.419 (less-than-half of v3.3 median 0.870). Promotion gates 0.78 stratified and 0.85 LOCO were both failed. v3.3 stays in production unchanged. Honest negative result.
#ml#classifier#negative-result#tabpfn#hollmann-23#honest#kept-v3.3#transformer#small-data
New findings ship as significant censorship events get measured. Subscribe to the Atom feed for new findings + every confirmed incident.