Atlas Findings
Curated deep-dives on the major censorship events Voidly Atlas has measured. Each finding gets a permanent URL with a journalist-friendly framing, relevant incident IDs, and links to the raw upstream data.
109 findings
- RU2026-06-26
Russia's foreign-press blockade: 264 blocked news domains — more of them Ukrainian than Russian
Voidly's confirmed-national blocklist for Russia (domains measured blocked across >=3 independent Russian networks, the OONI natblock floor) holds 608 domains, of which 264 are News Media — the largest…
#data#russia#news#press-freedom#foreign-press#exile-media - 2026-06-26
Two networks, one domain, different answers: why we publish single-source blocks as a floor, not a fact
A data-trust audit of the observatory itself. Voidly aggregates three independent networks that measure different things: OONI (active web-connectivity, 5,552 domains / 93 countries, full accessible+blocked verdicts),…
#data#data-trust#methodology#transparency#cross-source#concordance - IRRU2026-06-25
Where VPNs die: Iran and Russia block 10–15× more circumvention tools than anyone else
Rank every country by how many distinct anti-circumvention tools (VPNs, proxies, Tor services) it blocks NATIONALLY (confirmed across >=3 independent networks) and the distribution is a cliff, not a gradient: Iran 75,…
#data#vpn#circumvention#iran#russia#anon - IRSARUTRID2026-06-25
Religious censorship targets the other: states block minority faiths, rival sects, and apostasy
Take the domains each country blocks NATIONALLY in the Citizen Lab religion category (confirmed across >=3 independent networks) and a sharp pattern appears: states don't block 'religion', they block the religious OTHER…
#data#religion#human-rights#iran#saudi-arabia#russia - 2026-06-25
Why our category counts mirror the test list — a methodology honesty note
Voidly's category breakdowns ('Iran blocks 110 news sites', 'Russia leads on human-rights blocking') are real but answer a narrower question than they appear to: they describe what's blocked AMONG THE SITES TESTED, and…
#data#methodology#transparency#honest-negative#test-list#citizen-lab - KZRUVETZMM2026-06-25
Two ways to censor a network: VPN-blockers vs news-blockers
Volume hides intent: two networks can block the same AMOUNT and be doing opposite things. Voidly's new per-ISP category view (/v1/measurement/isp-categories) shows the SHAPE of a network's blocking, not just its size.…
#data#isp#asn#selective-censorship#vpn#circumvention - IRPKEGKHMMCN2026-06-25
How, not just what: censors reserve TCP-reset for messaging, DNS poisoning for news
How a censor blocks is as telling as what it blocks. Voidly's new per-content-category technique view (/v1/measurement/category-techniques) crosses the blocking METHOD with the content type. DNS poisoning is the…
#data#technique#method#dns#tcp-reset#dpi - IRRUCN2026-06-25
Above the BGP layer: how Iran's 2026 shutdown was enforced, content type by content type — an independent corroboration of arXiv:2603.28753
Iran did not block everything the same way. On Voidly's OONI-grounded measurements, where a blocking method can be resolved, Iran blocks news media overwhelmingly by DNS poisoning (85.3% of 116 method-bearing news…
#data#iran#filternet#allowlist#technique#method - 2026-06-24
Why a single threshold can't clean censorship false-positives — a measurement-noise audit
Voidly's national-censorship map flags a domain as nationally blocked when it is confirmed-blocked across >=3 independent networks (ASNs). It has a known weak spot, and this is the honest write-up of two attempts to…
#methodology#data-quality#false-positives#honest-negative#measurement-noise#multi-asn - 2026-06-24
How fresh is the data behind a censorship verdict? A per-country quality audit
An observatory that asks you to trust its censorship verdicts owes an honest answer to a prior question: how good is the data behind any one verdict? It is not uniform. Voidly has evidence for 215 countries but recency…
#methodology#data-quality#transparency#freshness#coverage#accountability - 2026-06-24
The most-blocked sites in the world are gambling and porn, not news
Ask anyone to name the most-censored websites and they say Facebook, Twitter, news. The data says otherwise. Ranked by how many countries NATIONALLY block them (confirmed across >=3 independent networks), the…
#data#gambling#methodology#national-blocking#restriction-map#transparency - 2026-06-24
Why there's no 'newly-censored this week' feed (yet) — an honesty note
The most-requested thing journalists want from a censorship observatory is a global 'what got newly blocked this week' feed. We built one, tested it, and took it down before it ever shipped — because the only version…
#data#methodology#transparency#honest-negative#false-positives#accountability - CNSATRRUIR2026-06-24
Every censor has a fingerprint: how China, Saudi Arabia, Turkey, Russia & Iran block differently
Censorship is usually reported as yes/no, but HOW a country blocks is as telling as what — and the method leaves a fingerprint. Aggregating every blocking measurement by technique, the major censors look strikingly…
#data#techniques#methodology#great-firewall#dpi#dns - 2026-06-24
The confirmed-block map is a measurement map: why Africa shows zero
Aggregate Voidly's confirmed national blocks by world region and you get a striking table: Asia 1,545, Europe 891, the Americas 29, Africa and Oceania ZERO. The tempting read — 'Africa barely censors' — is wrong. The…
#data#methodology#transparency#measurement-gap#africa#regional - CNIRRUCUBYSY2026-06-24
AI is blocked two ways: companies geo-fence the chatbots, states censor the model hub
'AI is blocked in Iran, Russia, China' is one headline hiding two different stories with two different villains. Tag WHY each AI service is unreachable and a clean division of labor appears: the AI COMPANIES geo-fence…
#ai#chatgpt#claude#gemini#huggingface#sanctions - RUCNIRTRMMPK2026-06-24
Who builds the censorship: fingerprinting the DPI gear behind the blocks
Internet censorship runs on boxes — deep-packet-inspection (DPI) appliances on the wire — and those boxes have makers with fingerprints. Voidly maintains a 14-signature DPI library (a decade of Citizen Lab + academic…
#dpi#supply-chain#fortinet#bluecoat#netsweeper#surveillance-tech - IRRUIDTHSATR2026-06-24
Two censorships: the world blocks gambling, the deepest censors block speech
A companion finding shows the world's most-blocked websites are gambling and porn, not news — true, but that's the BROAD picture (many countries each block a few casino/adult domains as routine licensing law). Ask…
#data#intent#speech#gambling#morality#national-blocking - RUIRIDTRSATH2026-06-24
Which governments censor LGBTQ+ content — and they block the local advocates, not just Grindr
Ranking countries by how many LGBTQ+-tagged domains they block NATIONALLY (confirmed across >=3 independent networks), Voidly's /v1/measurement/category-leaders?category=LGBT finds eight countries doing it at the…
#data#lgbtq#human-rights#russia#iran#indonesia - IRRU2026-06-24
Censorship twins: Iran and Russia block the same human-rights groups and the same VPNs
Compare the national blocklists of every pair of countries Voidly measures and one pair stands far above the rest. Iran and Russia — the two heaviest speech-censors in the corpus — share 135 nationally-blocked domains…
#data#iran#russia#human-rights#vpn#circumvention - 2026-06-23
Regional contagion does not improve the 7-day forecast — within-country dynamics already capture it
Censorship spreads regionally — a neighbour's shutdown often precedes your own — so regional "contagion" features are an obvious candidate to sharpen a 7-day forecast. We tested it honestly and it did not pan out.…
#methodology#ml#forecast#honest-negative#contagion#rolling-origin - 2026-05-29
Shutdown-risk predicts WHICH country, not WHICH day — a forward-validated honest scoping (v9)
Voidly's 7-day shutdown predictor (/v1/shutdown-risk) reported two AUCs: cross-country full-panel ~0.88-0.90 and within-country median ~0.73. This week we shipped v9 (a logit-blend combiner that replaced v5's…
#ml#shutdown-risk#forecasting#temporal-cv#honest-scoping#leave-one-country-out - 2026-05-29
The censorship classifier generalizes across countries but degrades across time — a forward-temporal audit
The production censorship classifier v3.3 (GradientBoosting, 16 country-day features) reports stratified 5-fold F1 0.729 / AUC 0.899 and leave-one-country-out (LOCO) median F1 0.870. We reproduced the stratified number…
#ml#classifier#temporal-cv#honest-scoping#generalization#leave-one-country-out - 2026-05-22
Empirical-Bayes partial pooling didn't lift the classifier tail. Held.
Classifier v3.3 has a known split: LOCO median F1 0.870 but mean only 0.711, dragged down by ~16 MENA / former-Soviet countries (OM, UZ, TN, LY, YE, JO, MA and similar) that score LOCO F1 between 0.00 and 0.36. This…
#methodology#ml#classifier#partial-pooling#empirical-bayes#james-stein - BZOMMAUZCN2026-05-22
Closing the active-learning loop — and the honest near-zero F1 lift
Voidly Atlas had an active-learning queue that ranked the unlabeled country-days the v3.3 classifier was least sure about — but the loop was OPEN. Human-consensus labels dead-ended at active_learning_labels.json;…
#methodology#ml#active-learning#honest-negative#loco#retraining - EGUZPKRUNGIR2026-05-22
Cutting the Sentinel alert false-alarm rate from 79% to 35%
The alert lead-time retrospective showed that 79.2% of Sentinel forecast-threshold alerts over 90 days were false alarms — four of five alerts that fired were not followed by a confirmed censorship incident. This…
#methodology#sentinel#alerts#false-alarm-rate#precision#honest-negative - UZCNJOEGPKBY2026-05-22
A logistic stacker lifts the fused anomaly ensemble from 0.66 to 0.75 AUC
Voidly Atlas fuses four unsupervised anomaly detectors — a DBSCAN per-country shape detector, an STL seasonal-residual detector, a multi-country burst detector, and an HDBSCAN per-domain drift detector — into one…
#methodology#ml#anomaly-detection#ensemble#stacking#temporal-cv - CNRUIRIDINAE2026-05-22
The AS-topology GNN does beat chance — once you give it a real label and a real test
Voidly Atlas runs a 2-layer GraphSAGE GNN over the CAIDA AS-AS peering graph (7,060 nodes, 841K edges) to score per-ASN shutdown risk. It shipped 2026-05-21 with passed_promote_floor=false and an honest caveat:…
#methodology#ml#gnn#graphsage#as-topology#honest-positive - IRCNRUBYMMVN2026-05-22
The 7-day shutdown forecast does not beat persistence — an honest re-evaluation
Voidly’s production 7-day shutdown forecast (forecast v1, XGBoost + isotonic) reports ROC AUC 0.954. A weakness audit found that number is not real: it comes from a shuffled train_test_split in…
#methodology#ml#forecast#honest-negative#data-leakage#temporal-cv - IRCNRUBYEGAE2026-05-22
Shutdown duration is not predictable — an honest audit of the survival model
Voidly Atlas ships a shutdown-duration model — a Random Survival Forest (RSF) that answers "once a shutdown starts, how long will it last?" — served at POST /v1/forecast/duration. The model card reported a concordance…
#methodology#ml#survival-analysis#duration#honest-negative#data-leakage - 2026-05-22
The multi-source Bayesian corroboration classifier rides circular features — an honest audit
Voidly's multi-source Bayesian corroboration classifier (corroboration_v1) was reported at ROC AUC 0.92. It fuses four sensor networks — OONI, IODA, CensoredPlanet and the Voidly probe network — into one naive-Bayes…
#methodology#ml#corroboration#bayesian#honest-negative#data-leakage - 2026-05-22
More labels can't fix the tail — a 155-positive data experiment that failed the gate honestly
Classifier v3.3 has a split personality: leave-country-out (LOCO) median F1 0.870 but mean only 0.711, dragged down by ~16 MENA/former-Soviet countries scoring LOCO F1 between 0.00 and 0.36. A prior finding…
#methodology#ml#classifier#honest-negative#leave-one-country-out#data-labeling - 2026-05-22
Shutdown onset is not predictable 7 days out — a definitive honest negative
Voidly's production 7-day forecast measures "is this country currently censored," not "will a new shutdown start" — its sliding-window target is 98.9% autocorrelated, so it scores AUC ~0.95 by reproducing a…
#methodology#ml#forecast#shutdown-onset#honest-negative#forward-temporal - 2026-05-21
Concept-drift detector: catching distribution shift before it poisons the models
Voidly Atlas runs two production models — the v3.3 censorship classifier and the v1 7-day forecast — both retrained weekly behind a dual-holdout promotion gate. But there was no principled detector for the question that…
#atlas#concept-drift#distribution-shift#psi#ks-test#data-quality - USCAGBFRINSG2026-05-21
Probe-node integrity detector: catching a compromised or misconfigured probe before it poisons incidents
The Voidly probe network is 40 nodes — 15 internal (Voidly-run) and 25 community (cp-* IDs, run by volunteers on hardware Voidly does not control). The community half is the trust weak point of the whole pipeline.…
#atlas#probe-network#integrity#consensus#trust#data-quality - IRCNRUAEKZ2026-05-21
Country censorship-behavior similarity graph: which countries block like Iran?
Censorship analysts, journalists and the forecast model all keep asking the same shape of question: "which countries behave like Iran?" — for transfer learning (borrow a label-rich neighbor's signal for a label-poor…
#atlas#similarity#embedding#cosine#umap#transfer-learning - EGUZPKRUNGIR2026-05-21
Alert lead-time retrospective: did Sentinel actually warn early? (the accountability number)
Voidly Sentinel fires a forecast_threshold alert when a country's 7-day censorship-risk forecast crosses the alert threshold, and the pitch is "early warning." This finding is the honest audit of that pitch. For every…
#sentinel#forecast#early-warning#accountability#lead-time#false-alarm-rate - 2026-05-21
Censorship-vs-natural-outage attribution meta-classifier
Many recorded shutdowns are ambiguous — a country goes dark for eight hours, was it state-ordered censorship or a fiber cut, BGP misconfiguration, DDoS, weather, planned maintenance? Confirmed-censorship attribution…
#ml#attribution#meta-classifier#censorship#outage#transparency - 2026-05-21
Real-time per-day SHAP attribution for the 7-day forecast
The /v1/forecast/{cc}/7day endpoint already exposed an aggregate top_features list (3 SHAP contributions for the whole forecast). Journalists kept asking the same follow-up: "okay, but WHY is day 5 higher than day 0?"…
#ml#forecast#shap#per-day#attribution#transparency - 2026-05-21
Multilingual semantic search: 50+ languages over the incident corpus
Voidly Atlas semantic search at /v1/atlas/search ran on all-MiniLM-L6-v2, an English-only sentence-transformer. Journalists querying in Arabic, Persian, Russian, Chinese, Hindi, etc. got noise. Added a parallel…
#ml#search#multilingual#embeddings#sentence-transformers#i18n - 2026-05-21
Fused anomaly ensemble: composite score over DBSCAN + STL + burst + HDBSCAN
Voidly Atlas runs four independent unsupervised anomaly detectors (DBSCAN per-country shape, STL seasonal residual, multi-country burst coincidence, HDBSCAN per-domain drift) that each surface a different axis of…
#ml#anomaly#ensemble#fusion#unsupervised#dbscan - 2026-05-21
Temporal Fusion Transformer (Lim et al. 2021) vs 3-XGBoost multi-horizon stack
Built a Temporal Fusion Transformer (TFT, Lim et al. arXiv:1912.09363) over the same 38-feature country-day panel that powers our XGBoost forecast. Single attention-based pass predicts a 30-day p10/p50/p90 trajectory,…
#ml#forecast#tft#transformer#attention#multi-horizon - 2026-05-21
Atlas Digest: a daily "what changed in 24h" round-up for journalists + AI labs
Single-call pre-rendered daily summary at /v1/atlas/digest (JSON) and /v1/atlas/digest.html (email-friendly). Eight sections: new incidents in last 24h, forecast movers (>10pp), DBSCAN anomalies, multi-country blocked…
#transparency#digest#sidecar#newsletter#journalists#ai-agents - 2026-05-21
Per-category censorship classifiers (NEWS / ANON / GRP / PORN / COMT / MMED / SRCH)
7 specialized XGBoost classifiers exposing WHAT a country is targeting, not just whether. NEWS, ANON (Tor/VPN/Lantern), GRP, PORN, COMT, MMED, SRCH all promoted via the alt-AUC path (stratified AUC 0.913-0.977, LOCO…
#methodology#ml#classifier#per-category#news#anon - 2026-05-21
Per-blocking-method specialized classifiers (4 methods, 2 promoted)
v3.3 is a single classifier for "is this country-day censored?" But censorship has different mechanisms. We trained 4 specialized XGBoost classifiers (DNS / TCP / HTTP / TLS), each with the same 16-feature v3.3 input.…
#methodology#ml#classifier#per-method#dns#tcp - 2026-05-21
Predictive shutdown contagion chain: pairwise XGBoost classifier (0.67 primary AUC)
New predictive piece in the Atlas stack: given country A had a confirmed censorship event today, score every other watched country by P(follow within N days). Pairwise XGBoost on (trigger, follower, horizon) tuples;…
#methodology#ml#forecast#contagion#predictive#temporal-point-process - 2026-05-21
Zero-shot cross-country transfer: meta-features beat the v3.3 prior on the tail
v3.3 regresses on 16 MENA + former-Soviet countries with <5 labels (OM, UZ, TN, LY, YE, JO, MA…) because no parametric model can learn from so few samples. A meta-feature regressor (regime + geography + historical…
#ml#classifier#zero-shot#transfer-learning#meta-features#tail-countries - 2026-05-21
Classifier v3.3 adversarial robustness: 88-93% under realistic evasion, weakest to noise dilution
Honest detection numbers measure performance on data that didn't try to evade us. We ran 200 known-positive incidents through 6 perturbation strategies a censorship regime could plausibly use (halved anomaly rate,…
#methodology#ml#classifier#adversarial#robustness#evasion - 2026-05-21
Impact-aware active-learning ranker: uncertainty × volume × drift
The active-learning queue used to rank candidates by |p − 0.5| alone (uncertainty sampling, Lewis 1994). It treated a Mali day with 1 measurement the same as an Iran day with 762. The new default ranking is a 3-factor…
#methodology#ml#active-learning#eer#settles-2009#heuristic - 2026-05-21
Hourly within-day shutdown forecast (K=6/12/24)
New XGBoost forecast predicts P(shutdown in next K hours) at hourly granularity, where the daily 7-day forecast collapses 24h of variance into a single bucket. Trained on a 75K-row country-hour panel; per-country median…
#methodology#ml#forecast#hourly#xgboost#shipped - 2026-05-21
Per-country F1-optimal thresholds — +4pp median F1 without retraining v3.3
v3.3 uses a single 0.5 decision threshold across 131 countries. Computing per-country F1-optimal thresholds via precision-recall sweep lifts median F1 +4pp (mean +5.4pp) for 73 countries with sufficient labels. 41…
#ml#classifier#v3.3#threshold-calibration#per-country#shipped - 2026-05-21
v3.8 cross-model meta-ensemble — fused 10 base classifiers, +8.4pp stratified F1 (PASS) but LOCO flat (FAIL), 7th honest negative
Calibrated Bayesian fusion: LogisticRegression + Isotonic over 10 base classifier outputs (v3.3 GBM, DBSCAN anomaly, Bayes corroboration, per-measurement XGB, per-method http+tls, per-category NEWS/ANON/GRP/COMT, STL…
#ml#classifier#meta-ensemble#stacking#logistic-regression#isotonic-calibration - 2026-05-21
v3.7 stacking ensemble over 4 base learners — failed F1 gate (+1.1pp vs needed +2.0pp), shipped as transparency endpoint
Stacked v3.3 GradientBoosting (OOF), DBSCAN unsupervised anomaly v1, Bayesian corroboration v1, and per-measurement classifier v1 into a meta-learner. Logistic regression won head-to-head vs MLP (16,8): stratified…
#ml#classifier#stacking#ensemble#meta-learner#logistic-regression - 2026-05-21
Quantile regression forecast (p5/p50/p95) — failed promote gate (zero-inflated target), shipped as negative result
Trained three LightGBM quantile regressors (alpha=0.05/0.50/0.95) on target_sum_7day for a journalist-grade p5..p95 band. LOCO coverage: p5=81% (nominal 5%), p50=91% (nominal 50%), p95=98% (nominal 95%). Only the upper…
#ml#forecast#quantile-regression#lightgbm#negative-result#zero-inflation - 2026-05-21
Tabular MAE self-supervised pretrain lost to v3.3 (stratified F1 0.573 vs 0.729, LOCO 0.645 vs 0.870) — kept v3.3
We tested SSL pretraining (tabular masked-autoencoder, He/Bahri-style) on a 9,722 country-day unlabeled superset, then fine-tuned on the same 4,237 labeled rows v3.3 uses. Stratified 5-fold F1 0.573 (v3.3 baseline…
#ml#classifier#negative-result#ssl#tabular-mae#pytorch - 2026-05-21
Shutdown duration forecast (Random Survival Forest) — c-index 0.728, n=343
Voidly Atlas now forecasts shutdown DURATION as well as probability. Random Survival Forest over 343 confirmed censorship incidents, test-set c-index 0.728, censoring rate 78%. Live at POST /v1/forecast/duration and…
#ml#forecast#survival-analysis#random-survival-forest#duration#shipped - 2026-05-21
TabPFN-v2 lost to v3.3 GradientBoosting (stratified F1 0.719 vs 0.729, LOCO 0.419 vs 0.870) — kept v3.3
We tested TabPFN-v2 (Hollmann et al. 2023, arXiv:2207.01848) as a v3.5 classifier candidate on the same 4,237-sample / 1,116-positive / 131-country / 16-feature dataset that v3.3 GradientBoosting uses. Published TabPFN…
#ml#classifier#negative-result#tabpfn#hollmann-23#honest - 2026-05-21
GraphSAGE over CAIDA AS-AS topology: LOOCV AUC 0.80 but n=6 is statistically thin
Built a 2-layer GraphSAGE GNN over the May 2026 CAIDA AS-relationship graph (7,060 nodes, 841K edges) to forecast per-ASN 7-day shutdown probability. Leave-one-out CV across the 6 tier-1 ASNs with enough density gives…
#methodology#ml#forecast#per-asn#gnn#graphsage - 2026-05-21
Row-level measurement classifier: per-measurement censorship scoring (Niaki KDD23 inspired)
New POST /v1/measurement/classify scores a single OONI, CensoredPlanet, IODA, or Voidly measurement and returns a probability + SHAP top-5 explanation. Inspired by Niaki et al. KDD 2023. Honest framing: the model learns…
#ml#classifier#row-level#measurement#xgboost#shap - 2026-05-21
Adaptive Conformal Inference: forecast calibration that updates itself
The forecast model now ships with Adaptive Conformal Inference (ACI) — an online update from Gibbs and Candes 2021 that keeps 90 percent intervals close to nominal under distribution shift. No retraining required, just…
#ml#forecast#calibration#conformal#aci#online-learning - 2026-05-21
Forecast labels cleaned: IODA outages no longer count as confirmed censorship
The forecast target_7day label was treating IODA outage alerts as confirmed censorship — flooding April 2026 with 1,011 disruption labels across 167 countries (94% of all April incidents). We split the labeling so only…
#methodology#ml#forecast#labels#data-quality#ioda - 2026-05-21
Per-domain HDBSCAN drift surface: novel-blocking detection orthogonal to per-country DBSCAN
Shipped a second unsupervised anomaly axis: per-DOMAIN HDBSCAN drift over the last-28-day feature vector for every domain with >= 10 measurements. Weekly cron compares this week vs last week — new clusters = novel…
#methodology#ml#anomaly#unsupervised#hdbscan#drift - 2026-05-21
CenDTect-style DBSCAN unsupervised anomaly: AUC 0.6506, promoted as second-opinion signal
Adapted the CenDTect approach (Aceto & Pescape 2025 — DBSCAN over OONI feature vectors) to Voidly's 80K-row evidence table. Per-country rolling 45-day window, DBSCAN(eps=75th-pct kNN, min_samples=3) on 12 standardized…
#methodology#ml#anomaly#unsupervised#dbscan#cendtect - 2026-05-21
Multi-horizon forecast shipped: 1-day, 7-day, 30-day separate models
Voidly's forecast is no longer single-horizon. We trained 3 separate XGBoost + isotonic models (1d, 7d, 30d), all clearing honest thresholds (AUC 0.91 / 0.88 / 0.84 LOCO). Each horizon has its own conformal interval +…
#methodology#ml#forecast#multi-horizon#shap#conformal - 2026-05-21
Atlas Score v2: base-rate weighting promotes chronic blockers (CN, RU, KP).
v1 of the score rewarded change over level — Russia/China/North Korea scored as B- because nothing was actively changing. v2 weights 50% structural baseline (12-month censorship-weighted incidents + tier floor) and only…
#methodology#atlas-score#base-rate#china#russia#experimental - 2026-05-21
Stealth blackout detector: 458 candidate days where BGP held but the data plane didn't
Aryapour 2025 (arXiv 2507.14183) showed Iran can run a "stealth blackout" — keep BGP routes UP while throttling DNS/HTTP/HTTPS. Invisible to BGP-based IODA. We built a heuristic detector: ping-slash24 critical alerts ≥…
#methodology#detection#stealth-blackout#iran#ioda#aryapour - 2026-05-21
Per-ASN forecasting: not viable today. The probe network needs 5× more ASN coverage first.
We prototyped per-ASN granular forecasting (one model per ISP/AS) per Saha et al. WebSci 2025. Of 168 ASN-tagged ASs in our evidence corpus, only 6 had ≥30 measurement days — and only 1 had enough class variance to…
#methodology#ml#forecast#per-asn#data-density#honest-not-yet - 2026-05-21
Forecast hyperparameter grid search: defaults already near-optimal
Ran a 27-cell GridSearchCV over XGBoost (n_estimators × max_depth × learning_rate) plus a follow-up min_child_weight/gamma sweep. Holdout AUC improved +0.007. But LOCO median AUC DROPPED -0.003. Best params lose in 7 of…
#methodology#ml#forecast#hyperparameters#honest-no-improvement#distribution-shift - 2026-05-21
Forecast v2 contagion: huge aggregate wins, IR regresses 27pp. Held back.
Applied the classifier v3.3 regime-weighted-contagion playbook to the XGBoost forecast model. Stratified F1 +4.9pp, LOCO median F1 +17.8pp (! — bigger than classifier got), 15 of 19 countries improve. But Iran — a…
#methodology#ml#forecast#contagion#iran#honest-no-promote - 2026-05-21
Classifier v3.4: regime-cluster fine-tuning didn't fix the tail. Held.
Tried per-regime-cluster fine-tuning heads (MENA, post-Soviet, East Asia, SE Asia, LATAM, Sub-Sahara) stacked on top of v3.3 to recover the 16 countries that regressed under v3.3. The stacking head learns to mostly…
#methodology#ml#classifier#regime-cluster#fine-tuning#negative-result - 2026-05-21
Classifier v3.3: regime-similarity contagion. Better on aggregate, MENA trade-off.
v3.2 weighted neighbors by geography (UN subregion) and the results were mixed. v3.3 weights neighbors by historical anomaly_rate correlation — and wins clearly on aggregate. Stratified F1 0.673 → 0.729 (+8%), LOCO…
#methodology#ml#classifier#contagion#regime-similarity#honest-trade-off - 2026-05-21
Causal attribution for shutdowns: synthetic DiD applied to internet censorship
When a shutdown happens, we can now answer "what caused it?" with a defensible counterfactual. /v1/sentinel/attribute builds a synthetic control from weighted stable-democracy donors, measures the post-period gap, runs…
#methodology#attribution#causal-inference#sdid#novel - 2026-05-21
Cross-country contagion features: wins on the tail, regresses on EG. Held back.
We added 3 neighbor-risk features to v3.1 and retrained as v3.2. Stratified F1 jumped 0.673 → 0.712 and the targeted weak countries (PK, TH, SG) improved 3-9 points. But EG regressed 13.5 points and Western Europe got…
#methodology#ml#classifier#contagion#experiment#honest-failure - 2026-05-21
Classifier v3.1: trained on 13.5× more data, evaluated on 18× more countries
v3 was the leakage fix. v3.1 is the data fix. By mining the live incidents table for per-country-day labels, the training set jumps from 314 / 18 positive / 7 countries to 4,237 / 1,116 positive / 131 countries. LOCO…
#methodology#ml#classifier#training-data#honest-metrics - 2026-05-21
Six new ML transparency surfaces shipped in one session
Every Sentinel forecast now ships with SHAP contributions + a conformal interval. The v3 classifier has public feature-importance and metadata endpoints. /sentinel/backtest renders the reliability diagram,…
#methodology#ml#transparency#shap#classifier#forecast - 2026-05-21
Classifier v3: removed the 85% leakage feature, got 0.86 LOCO F1
The v2 classifier hit 99.8% F1 but country_risk_tier (a hardcoded label leakage) carried 85% of that signal. v3 drops it. Honest leave-country-out F1: 0.86 (Iran AUC 0.95).
#methodology#ml#classifier#transparency#no-leakage - 2026-05-21
Live 30-day production track record across all forecast models
New endpoint /v1/atlas/prediction-track-record joins daily-logged forecasts against observed incidents. forecast_7day (v1) ships with empirical precision 0.69 / recall 0.39 over the last 720 predictions; the other 11…
#transparency#ml-honesty#forecast#calibration - 2026-05-21
Per-individual-domain 7-day shutdown forecast (top 42 domains × 50 countries)
Per-individual-domain shutdown forecast at /v1/forecast/domain/{domain}/{cc}. Shared XGBoost across all (domain, country) pairs with domain one-hot — LOCO median AUC 0.999 across 28 evaluable domains, temporal-holdout…
#forecast#per-domain#ml#transparency - 2026-05-21
Forecast 7-day isotonic calibration refit (-56pp calibration drift)
The /v1/atlas/prediction-track-record endpoint surfaced a +56.45pp under-prediction drift on forecast_7day (mean predicted 4.9% vs empirical positive rate 61.4%). Two upstream bugs: a stale isotonic mapping fit on…
#forecast#calibration#ml-honesty#transparency - 2026-05-21
Auto-incident watchdog: DBSCAN + Bayesian corroboration draft generator
New watchdog at scripts/auto-incident-watchdog.py cross-runs the DBSCAN unsupervised anomaly model with the Bayesian corroboration model every 6 hours. When DBSCAN flips a country-day AND the Bayesian posterior is at…
#watchdog#dbscan#corroboration#ml-honesty#editorial-queue - 2026-05-21
Per-country forecast calibration drift monitor (auto-alert at +/-15pp)
New monitor at scripts/build-per-country-calibration-drift.py walks the top-50 most-active forecast countries daily at 05:00 UTC and computes mean predicted probability vs empirical positive rate (censorship/mixed only,…
#monitoring#calibration#ml-honesty#transparency#cenalerts - 2026-05-21
ML serving reliability dashboard: 30+ endpoint health in one curl
With 30+ ML endpoints (/v1/forecast/*, /v1/classifier/*, /v1/anomaly/*, /v1/measurement/*, /v1/sentinel/*) in production, a journalist or partner asking "is the model live and working?" used to need 30+ curls. The new…
#monitoring#reliability#ml-honesty#transparency#observability - 2026-05-21
STL seasonal anomaly detector (complement to DBSCAN, orthogonal signal)
New per-country anomaly detector using STL (Seasonal-Trend decomposition via Loess; Cleveland 1990) that learns each country's own weekly rhythm and flags days that break it. ORTHOGONAL to DBSCAN at…
#anomaly-detection#stl#seasonal#orthogonal-signal#ml-honesty#transparency - 2026-05-21
Cross-protocol classifier: per-port blocking probability (8 protocol groups)
Eight small XGBoost classifiers, one per protocol group (HTTP-80, HTTP-headers, web_connectivity, TLS-WhatsApp, TLS-Signal, TLS-Telegram, TLS-FB-Messenger, Tor). Given a measurement to a (host, port) on a country/day,…
#classifier#protocol#ml#per-port#ml-honesty - 2026-05-21
Competitive benchmark: Voidly vs Cloudflare Radar / Access Now / NetBlocks (20 landmark events)
Hand-curated lead/lag comparison across 20 landmark shutdown events (2019-2026): Mahsa Amini, Bangladesh quota protests, Brazil X/Twitter, Venezuela election, Kenya finance bill, Sudan coup, Myanmar coup, Uganda…
#transparency#ml-honesty#benchmark#journalism#sources - 2026-05-21
Multi-country anomaly burst detector: candidate coordinated censorship campaigns
Single-country anomaly detectors (DBSCAN, STL) catch local events. This burst detector catches CROSS-COUNTRY synchronized events — K>=3 countries flipping anomalous on the same day. Pipeline: 90d lookback, mirror the…
#anomaly-detection#burst#coordination#cross-country#ml-honesty#transparency - CNRUIRMMPKTR2026-05-21
DPI fingerprint library: heuristic vendor attribution for 19,506 evidence rows across 10 device families
Voidly Atlas previously told you HOW a country blocks (DNS / TCP / TLS / blockpage) but not WHICH VENDOR. The new DPI Fingerprint Library v1 closes that gap with a curated, public, citation-backed library of 14…
#dpi#vendor-attribution#fingerprints#investigative#transparency#ml-honesty - PKUZNISDERIR2026-05-21
Per-day model uncertainty surfacer: which Voidly forecasts to question today
Voidly Atlas already surfaces model confidence universally (conformal intervals on the 7-day forecast, AUC/F1 on the classifier, online ACI alpha for drift). What was missing: a single SCORE for THIS day's prediction in…
#uncertainty#transparency#ml-honesty#journalist-facing#calibration#cross-model - IRCNRUPKAZEG2026-05-21
Block-evasion success-rate index: which circumvention tools actually reach their bootstrap endpoint, per country
Activists and journalists routinely ask which circumvention tool actually works in a given country. The honest historical answer has been "try a few and see what survives." The Voidly evidence table records every…
#evasion#tor#vpn#circumvention#evidence-based#transparency - IRCNRUTRPKEG2026-05-21
Auto-fact-check service for journalist claims: natural-language → verdict + evidence permalinks in milliseconds
Most censorship-research platforms force a journalist to manually query a country page, then a service page, then cross-reference probe rows by hand. The Voidly Atlas auto-fact-check service inverts that flow: a…
#fact-check#journalism#claim-verification#evidence-based#transparency#ml-honesty - RUIRPKBDEGTR2026-05-21
Government statement scraper: pairing ministry press releases with Voidly shutdown incidents
Voidly Atlas previously saw shutdowns only from the network side (OONI/IODA/Voidly probes). This v1 ships a curated government-statement scraper over 7 ministries (Russia Roskomnadzor, Iran MICT, Pakistan PTA,…
#scraper#government#press-releases#cross-source#investigative#transparency - 2026-05-21
Daily domain delta: which domains gained or lost blocking countries overnight
Most censorship dashboards answer "is X blocked in Y right now?" The Voidly Atlas daily domain delta answers the question journalists actually need at 8 AM: which domains gained or lost blocking countries between…
#daily#domain-tracking#delta#journalist#leading-indicator#transparency - EGJOMAUZTZIN2026-05-21
Live next-24h contagion watchlist: which countries are most likely to block in the next 24-48h
The contagion-chain model shipped earlier this week is descriptive ("given country A blocked, P(B follows in 7d) = X"). This new endpoint is PROACTIVE: it consumes the triggers that ACTUALLY fired in the last 48h and…
#contagion#forecast#live#proactive#watchlist#next-24h - IRRUBYTRPKEG2026-05-21
OONI test-type meta-classifier: per-country diagnostic ranking (which test type matters where)
Voidly Atlas runs eight OONI test types every six hours: web_connectivity, signal, whatsapp, telegram, facebook_messenger, tor, http_invalid_request_line, http_header_field_manipulation. Until today all eight were…
#ooni#test-types#diagnostic#per-country#feature-importance#transparency - IRPKCNRUTRVE2026-05-21
Circumvention recommendation engine: ranked try-first / fallback / avoid per country, evidence-based
Activists, journalists, and refugees routinely need to know which VPN / Tor / Lantern variant actually works in their country. The historical answer has been "ask in a forum, try a few, see what survives." This endpoint…
#circumvention#tor#vpn#recommendation#evidence-based#ml-honesty - IRNGMMBDKEVE2026-05-21
Pre-shutdown network signal detector: do BGP, TLS-reset and new-ASN precursors lead a blocking event?
A user-visible shutdown is the end of a process — by the time domains stop loading, the routing and filtering infrastructure has often already moved. This finding ships a per-country composite "pre-shutdown signal…
#bgp#tls#asn#pre-shutdown#early-warning#sentinel - MMNGSAAZEGRU2026-05-21
Pre-protest GDELT correlator: do news-mention spikes predict a shutdown 48h later?
Many internet shutdowns are reactive — they happen after a protest spike makes the news. We pulled daily GDELT counts of PROTEST + RIOT mentions for the 29 censorship-heavy countries already tracked by our event-ingest…
#gdelt#protest#early-warning#sentinel#correlation#transparency - 2026-05-21
Synthetic baseline benchmark: every Atlas ML model vs predict_yesterday, base-rate, and four other trivial baselines
It is easy to claim "F1 0.87" for a censorship-forecast model. It is much harder to answer "is the model adding value over predict_yesterday?" honestly. This finding ships the synthetic-baseline benchmark suite: every…
#baseline#benchmark#accountability#ml-honesty#transparency#predict-yesterday - NGZWMXSIIQVE2026-05-21
Cohort migration tracker: which countries are shifting DTW censorship cohorts over time
Voidly Atlas already clusters the 50 highest-signal countries into DTW cohorts — C1 stable democracies, C2 bursty, C3 persistent authoritarian — by daily-signal SHAPE similarity under Dynamic Time Warping. But that…
#cohort#dtw#clustering#regime-change#time-series#transparency - EGPKVEINUZBR2026-05-21
Voidly Score: one continuous 0-100 daily number for "how censored is this country today"
Journalists asking Voidly Atlas "how censored is Iran today" kept hitting seven separate numbers — a supervised classifier probability, a 7-day forecast, a DBSCAN anomaly score, a source-agreement rate, an incident…
#voidly-score#composite#index#headline-metric#ml-honesty#transparency - UZIRDZAZ2026-05-21
Probe scheduling optimizer: a Thompson-sampling priority list for where to point the probe network next
The Voidly probe network has a hard capacity ceiling — roughly 40 nodes, 62 domains, a 5-minute cadence — and today every node runs the same fixed domain list everywhere. That spends probe attention uniformly, which is…
#probe-network#thompson-sampling#bandit#scheduling#information-gain#recommendation - ERKPTMGBCA2026-05-21
Data-driven country risk tiers: re-deriving a hand-set model input from objective signals
Every country in Voidly Atlas carries a risk tier — an integer 1-5 in country_geography.risk_tier, tier 1 = highest censorship risk, tier 5 = lowest. That tier is not a measurement: an analyst hand-set it. And it is not…
#risk-tier#reclassification#proposal#clustering#kmeans#data-quality - SGBDIDINBHEG2026-05-21
Per-mobile-carrier blocking detector: splitting censorship severity by mobile carrier vs fixed broadband
Internet access is not one thing. In much of the Global South the cheap, dominant path online is a mobile carrier — a SIM and a cellular data plan — while fixed broadband (FTTH, cable, DSL) reaches a smaller, often…
#mobile-carrier#asn#isp#dpi#blocking-method#data-quality - SATHUZPKAZIQ2026-05-21
Lead/lag cross-correlation: which countries’ censorship precedes others’
Where /atlas/correlation-matrix shows simultaneous co-movement and /atlas/cohorts shows shape similarity, this surface adds the time-shifted axis: for every pair of the 50 most-censored countries, the daily…
#methodology#lead-lag#cross-correlation#contagion#fdr#atlas - 2026-05-20
Forecast retrain unblocked: dual-holdout gate (legacy + temporal)
The weekly forecast retrain has rejected every new model since May 3, 2026, because the frozen 2024-style holdout no longer reflects 2026 reality. We shipped a dual-holdout gate that requires the new model to not…
#methodology#ml#forecast#retrain#distribution-shift#holdout - 2026-05-20
How we audited our own shutdown-forecast model and published the embarrassing numbers
Voidly Sentinel publishes three accuracy splits — stratified (inflated 0.98 AUC), time-based (random 0.50), and LOCO median (honest 0.91). We cite the honest number, not the impressive one.
#methodology#ml#transparency#forecast - 2026-05-20
How we fixed Sentinel's 15× miscalibration in one afternoon
The forecast was telling journalists "5% risk in Iran" when the actual incident rate was 65%. We refit isotonic regression on 810 live (predicted, observed) pairs from sentinel_outcomes. Brier dropped 0.59 → 0.22;…
#methodology#ml#forecast#calibration#transparency - CNIRRUTM2026-05-20
Anti-circumvention tools are universally targeted
Our probe network detects 100% block rate on getlantern.org globally and 23%+ block rates on Signal, Telegram, WhatsApp — the same anti-circumvention toolkit blocked in every restrictive regime.
#circumvention#global#media-freedom - IR2026-05-20
Iran 2026 Presidential Election: 52% peak shutdown risk
Voidly's forecast model flags a 52% peak shutdown risk for Iran in the 7-day window leading into the 2026 presidential election, citing election-day as the primary driver.
#elections#middle-east#forecast#shutdown - VE2026-05-20
Venezuela: 63 confirmed censorship incidents in 90 days
Venezuela leads the world in incident volume on the Voidly Atlas — 63 confirmed events in the last 90 days, the highest count of any country we track.
#shutdown#elections#latin-america#leading-indicator
New findings ship as significant censorship events get measured. Subscribe to the Atom feed for new findings + every confirmed incident.