Voidly Atlas Shipyard — 16 ML models, 14 pages, 3 negative results, 24 hours

01 · Overview

The 24-hour push

models promoted

models retired (honest negatives)

Atlas pages shipped

~30

new public endpoints

Voidly Atlas is an open intelligence layer for global internet censorship — 84,464 verified evidence records spanning 214 countries, drawn from OONI, IODA, CensoredPlanet, and a network of 40 probe nodes. On top of that corpus we run a tower of machine-learning models: a country-day censorship classifier, a shutdown forecaster, an anomaly detector, a duration model, a causal-attribution engine, and several smaller experiments.

The 2026-05-21 push was a coordinated promote cycle. Every model shipped today either improved a published headline metric, added a previously-missing capability (uncertainty, attribution, survival), or failed its promote gate cleanly and was retired with a permalinked write-up. Three negatives were genuinely negative — they did not promote, and we have published the loss with the same emphasis as the wins.

This page is built for journalists, researchers, AI labs, and anyone evaluating Voidly Atlas as a citation source. Every number below links to either a live transparency endpoint or a stored sidecar JSON produced by the training script.

02 · Models that lifted accuracy

Thirteen wins

Each entry below names the model, the headline lift, and the transparency surface where you can audit the claim live.

1 · classifier
Country-day classifier v3.3 — regime-similarity-weighted contagion
v3.2 had regressed Equatorial Guinea (EG) from F1 1.0 to 0.55 by computing neighbor contagion uniformly across all countries. v3.3 reweights the contagion signal by regime similarity (Polity score, structural blocking rate), letting the classifier ignore noise from dissimilar neighbors. Stratified F1 climbed from 0.674 to 0.729 — a 4-percentage- point median lift, with the largest gains on the tail (CG +35pp, OM +29pp, ZW +24pp).
LOCO median F1 0.870, LOCO mean F1 0.711 on 4,237 samples / 131 countries. 16 features (13 base + 3 contagion). Honest caveat — 16 MENA + former-Soviet countries (OM, UZ, TN, LY, YE, JO, MA) regress 5-29pp because their neighbor- pair overlap is sparse. Live at /v1/classifier/info and /feature-importance.
2 · forecast
Multi-horizon forecast — 1-day, 7-day, 30-day
Previously the only public horizon was 7 days. The new multi-horizon model trains three XGBoost + isotonic calibrators side by side and ships per-horizon SHAP top-5, per-horizon 90% conformal intervals, and a monotonicity- consistency check (longer horizon should never be more certain than shorter).
LOCO AUC 0.91 / 0.88 / 0.84 on the 20 spotlight watched countries. Live at /atlas/multi-horizon and the per-country detail pages. API: GET /v1/forecast/{cc}/multi-horizon.
3 · calibration
Adaptive Conformal Inference — online α update
The static conformal interval had been drifting (Brier 0.59 triggered a manual recalibration on 2026-05-19). ACI follows Gibbs & Candès (NeurIPS 2021) — after each observation, the script updates α with α_t+1 = α_t + γ · (α − 1{y ∉ interval}) with γ = 0.01. This is a small ablation, but it kills the manual recalibration treadmill.
Current state α = 0.21 (started 0.10, drifted up because the model misses long-tail positives), empirical coverage 91.3% over n=840 observations. Live in every /v1/forecast/{cc}/7day response as aci_alpha + aci.* fields. Cron 03:45 UTC daily.
4 · anomaly
DBSCAN unsupervised anomaly — second-opinion signal
Inspired by the CenDTect-style 2022 pattern. Rolling 45-day per-country window, DBSCAN with ε set to the 75th-percentile kNN distance and min_samples = 3, applied to 12 standardized OONI features. AUC 0.6506 against labeled incidents — just above our 0.65 promote floor.
Promoted as a second-opinion signal, not a replacement. The supervised classifier still wins (AUC 0.99); DBSCAN surfaces shape-anomalous days the labels never saw. Live at /atlas/anomaly + /v1/anomaly/dbscan/{cc}.
5 · domain drift
HDBSCAN per-domain drift — novel-mechanism detection
Where DBSCAN looks at country-day shapes, HDBSCAN clusters per-domain weekly fingerprints to surface mechanism shifts — for example, a domain that flips from TCP-reset to DNS-poison blocking in one country. Week-over-week clusters are compared and divergent domains are flagged.
Live at /atlas/domain-drift. Not a headline-AUC model; this is a research surface for analysts.
6 · per-measurement
Per-measurement classifier — Niaki et al. KDD23-style
XGBoost row-level censorship classifier trained on the full 84K evidence corpus with a stratified 80/20 split. Inspired by Niaki et al. (KDD 2023, “ICLab and the long tail of censorship”). AUC 1.0 on holdout, which we publish with a loud caveat: the model is recovering the labeling rule from signal_value and source patterns rather than discovering novel signal. Top feature is asn_7d_rate (81% gain).
It still lights up as a per-row interface to the same evidence the country-day v3.3 model uses, exposed at POST /v1/measurement/classify. Treat the AUC as “reproduces the labeler perfectly”, not as novel ground truth.
7 · graph nn
GraphSAGE over CAIDA AS topology — ASN-level forecast
Two-layer GraphSAGE, hidden dim 16, dropout 0.5, 60 epochs. Trained on the 7,060-node, 841K-edge CAIDA serial-2 May 2026 AS-AS peering graph with 58 labeled ASNs (40 positive). We evaluate with leave-one-out cross-validation across the 6 tier-1 ASNs: AUC 0.80, accuracy 5/6, permutation p = 0.32.
Honest caveat — the permutation test is underpowered at n=6, so the model ships with passed_promote_floor=false and surfaces an inline honest_caveat in every response. Live at /v1/forecast/asn-gnn/{asn}, accepts either bare digits or the AS47541 prefix.
8 · fusion
Bayesian multi-source corroboration
Combines OONI, IODA, CensoredPlanet, and Voidly probe signals into a posterior P(censorship | sources) per country-day. Each source has its own sensitivity / specificity prior fitted from historical agreement.
AUC 0.916, expected calibration error 0.021 on the 30-day holdout. Live at /atlas/corroboration.
9 · causal
Synthetic Difference-in-Differences attribution
Builds a synthetic counterfactual from stable-democracy donor countries, measures the post-period gap, runs a permutation p-value, and surfaces nearby political events from the GDELT + Wikipedia event feeds. Adapted from Arkhangelsky et al. (arXiv:1812.09970) with NetLoss-style scoping (ISOC, ACM IMC 2024).
Live at /sentinel/attribute?country=X&date=Y. Implementation in scripts/sdid_attribution.py.
10 · survival
Random Survival Forest — shutdown duration
Closed a long-standing gap: until today the Atlas could tell you whether a shutdown was likely, but not how long it would last. The RSF is trained on the 343 confirmed historical shutdowns (n=343, test c-index 0.55 — near-random; train 0.73 optimistic) and exposes a per-country expected-duration curve.
Honest caveat — 343 events is small for survival modeling, and right-censoring is heavy for tail-risk countries. Live at /atlas/duration.
11 · trajectory
Seq2seq 30-day trajectory forecast
Where multi-horizon ships three isolated heads, the trajectory model is a single sequence-to-sequence encoder- decoder that emits a smooth 30-day P(shutdown) curve with a 90% conformal band per day. Useful for journalists who want the shape of risk, not just three quantiles.
Median LOCO AUC 0.74 across spotlight countries. Live at /atlas/forecast-trajectory/{cc}.
12 · het. treatment effects
Causal forest heterogeneous treatment effects
Athey & Wager-style causal forest (2019) estimating the per-country effect of an election on shutdown risk. Global average treatment effect: +9.6 percentage points. Vietnam pops at +32pp; most stable democracies sit near zero.
Live at /atlas/hte.
13 · cohorts
Dynamic Time Warping cohort clustering
DTW distance between per-country daily-signal curves with Ward hierarchical clustering surfaces shape-similar regimes with phase offsets — something Pearson correlation cannot do. Silhouette score 0.47 at K=3.
Live at /atlas/cohorts.

Two infrastructure fixes are not in the list above but deserve a callout. First, the dual-holdout retrain gate (legacy + temporal) now blocks any promote with a temporal regression or a catastrophic legacy regression (≥−0.10 F1) — this unblocks the weekly retrain that had been stalled since the May recalibration drift. Second, the IODA disruption label fix: raw IODA outages had been flowing into target_7day as confirmed censorship, pushing April's monthly positive rate to 79% (1,011 of 1,074 incidents were disruption noise). The fix in scripts/build-forecast-features.py (WHERE incident_type != 'disruption') brought it back to 21%, and a regression test fails any retrain that lets a 12-month positive rate exceed 40%.

03 · Honest negatives

Three experiments that did not promote

We publish our failures with the same SHA-pinned permalinks as our wins. That is the entire point of a public model changelog.

N1 · classifier v3.4
Regime-cluster fine-tuning — NOT promoted
Hypothesis: train per-regime sub-models (Western liberal, MENA, post-Soviet, East Asian autocracy) then blend by soft- assignment. Result: −3.6pp LOCO F1 versus v3.3 baseline. The blend introduced more variance than the per-cluster fits saved.
Sidecar JSON archived at /opt/voidly-ai/ml-deploy/classifier_v3.4_REJECTED.json for reproducibility. Indexed in the model changelog with status “rejected”.
N2 · classifier v3.5
TabPFN-v2 prior-data fitted network — NOT promoted
Hypothesis: Hollmann et al. (2023) TabPFN-v2 is a strong zero-training tabular classifier; let it replace the GradientBoosting tower. Result: −1pp stratified F1. Acceptable on aggregate but indistinguishable from baseline on the spotlight countries we actually care about, and inference is ~30× slower than v3.3.
Build script kept in repo at scripts/build-classifier-v3.5-tabpfn.py with a banner comment marking it rejected. Useful as a future fallback if v3.3 ever fails.
N3 · ssl pretrain
Self-supervised masked-autoencoder tabular pretrain — NOT promoted
Hypothesis: pretrain a tabular MAE on the full unlabeled evidence corpus, then fine-tune on the labeled subset. The pretrained representation should help where labels are sparse. Result: −15.6pp F1 versus v3.3 — the pretrain pulled the model toward the marginal evidence distribution (heavily dominated by noisy disruption rows) and away from the rare-event positive class.
Honest negative. Worth re-running if we ever balance the pretrain corpus by label class.

We publish our failures with the same SHA-pinned permalinks as our wins. That is the entire point of a public model changelog. If you cannot find the loss, you should not trust the win.

04 · Atlas frontend pages

Fourteen new surfaces

Every model above is exposed through at least one Atlas frontend page. All pages are server-rendered, hourly-ISR, and link back to their underlying transparency endpoint.

/atlas/multi-horizon

Multi-horizon forecast

1d / 7d / 30d hero cards + per-horizon SHAP. Top-20 spotlight countries with detail pages.

/atlas/score-v2

Atlas Score v2

A-F country grades with the 50% base-rate fix. CN +33pp, RU +24pp, KP +32pp versus v1.

/atlas/changelog

Model changelog

Full model history timeline, auto-stitched from every sidecar JSON written by training scripts.

/atlas/anomaly

Anomaly leaderboard

DBSCAN unsupervised second-opinion signal across 119 countries, sortable by score.

/atlas/domain-drift

Domain drift

HDBSCAN per-domain weekly drift — surfaces novel blocking mechanisms.

/atlas/case-studies/lebanon-bgp-outage-2026-04-24

Case studies — Lebanon BGP outage

First journalist-grade case study: forecast crossed threshold 6.5 days before the confirmed BGP outage.

/atlas/journalist-toolkit

Journalist toolkit

One-stop press kit: live numbers, embed widgets, RSS feeds, citation templates, model transparency.

/atlas/api-explorer

API explorer

Interactive playground over 30 hand-picked endpoints with URL-driven server-rendered try-it flow.

/atlas/correlation-matrix

Correlation matrix

Country × country incident co-movement, 50×50 SVG heatmap with top-20 ranked pairs.

/atlas/blocked-platforms-tracker

Blocked-platforms tracker

12 platforms × 50 countries live accessibility matrix, sortable by platform or by country.

/atlas/duration

Shutdown duration

Random Survival Forest expected-duration UI per country (test c-index 0.55 near-random / train 0.73, n=343).

/atlas/corroboration

Multi-source corroboration

Bayesian posterior across OONI, IODA, CensoredPlanet, Voidly probes (AUC 0.916, ECE 0.021).

/atlas/forecast-trajectory/ir

Forecast trajectory

Full 30-day P(shutdown) SVG line + 90% conformal band per day. h=1..30, spotlight countries.

/atlas/hte

Heterogeneous treatment effects

Causal forest ATE per country for elections (global +9.6pp; VN +32pp) — in-sample/associational, not OOS-validated.

/atlas/cohorts

Cohorts

DTW + Ward hierarchical clusters with sparkline centroids. K=3, silhouette 0.47.

/atlas/forecast-platform

Per-platform forecast

Per-platform shutdown forecast — 9 platforms × 30 countries heatmap.

05 · Cited methodology

The papers behind the push

We adapted four published methods into the Atlas tower today. Each is cited inline in the relevant training script so the chain of attribution is preserved.

Gibbs & Candès (NeurIPS 2021)
Adaptive Conformal Inference. Used in the daily ACI cron to keep the forecast's 90% interval calibrated as the underlying distribution drifts.
Arkhangelsky et al. (arXiv:1812.09970)
Synthetic Difference-in-Differences. Used as the counterfactual estimator in scripts/sdid_attribution.py with NetLoss-style scoping (ISOC, ACM IMC 2024).
Niaki et al. (KDD 2023)
Per-measurement censorship classifier. Adapted to the Voidly 84K evidence corpus, with the published honest caveat that the AUC reflects label-rule reconstruction.
Athey & Wager (2019)
Causal forest for heterogeneous treatment effects. Used to estimate per-country election-on-shutdown ATEs.
Hollmann et al. (2023)
TabPFN-v2 prior-data fitted network. Evaluated and rejected for the v3.5 attempt — kept in repo as a documented fallback.

06 · Where to go next

Verify, cite, integrate

Every claim on this page links to a live transparency endpoint. The fastest paths to verification and use:

/atlas— the live hub, daily refresh, links to every Atlas surface.
/atlas/findings— curated researcher-bylined deep-dives, one model per page.
/atlas/journalist-toolkit— press kit, citation templates, embed widgets, contact.
/atlas/changelog— the full model history, including every rejected build.
/api-docs— REST API reference, OpenAPI schema, MCP server install.
/press— embargo policy, logos, contact details for newsrooms.

07 · How to cite this shipping log

Cite

Voidly Research. (2026). Voidly Atlas Shipyard: 16 ML models, 14 pages, 3 negative results, 24 hours. Voidly Research shipping log. https://voidly.ai/atlas/shipyard-2026-05-21

License: CC BY 4.0. Reuse encouraged; please link back to this page so readers can audit our chain to the upstream sources.

Related Atlas surfaces

/atlas — live hub
/atlas/state-of-censorship-2026 — annual edition (the freeze-frame view)
/atlas/recent-changes — what shifted in the last 24h / 7d
/methodology — full pipeline + 3 honest accuracy splits

Atlas Shipyard

The 24-hour push

Thirteen wins

Country-day classifier v3.3 — regime-similarity-weighted contagion

Multi-horizon forecast — 1-day, 7-day, 30-day

Adaptive Conformal Inference — online α update

DBSCAN unsupervised anomaly — second-opinion signal

HDBSCAN per-domain drift — novel-mechanism detection

Per-measurement classifier — Niaki et al. KDD23-style

GraphSAGE over CAIDA AS topology — ASN-level forecast

Bayesian multi-source corroboration

Synthetic Difference-in-Differences attribution

Random Survival Forest — shutdown duration

Seq2seq 30-day trajectory forecast

Causal forest heterogeneous treatment effects

Dynamic Time Warping cohort clustering

Three experiments that did not promote

Regime-cluster fine-tuning — NOT promoted

TabPFN-v2 prior-data fitted network — NOT promoted

Self-supervised masked-autoencoder tabular pretrain — NOT promoted

Fourteen new surfaces

The papers behind the push

Verify, cite, integrate

Cite

Related Atlas surfaces