Will Twitter be blocked in Iran next week?
One XGBoost model per platform, scoring the 7-day shutdown probability for each (country, platform) pair. Built for journalists who need a per-target prediction rather than a single country-wide risk score.
Trained 2026-05-21 · 9 platforms · model info
7-day shutdown probability grid
Each cell is the probability that the given platform sees at least one critical-level block in the given country in the next 7 days. Click a cell for per-call SHAP-style attribution.
Per-platform model details
| Platform | Status | LOCO AUC | Temporal AUC | Train rows | Positives | Top feature |
|---|---|---|---|---|---|---|
| Discord | skipped | — | — | 0 | 0 | insufficient data |
| live | 1.000 | 0.970 | 1804 | 1444 | crit_rate_7d | |
| live | 1.000 | 0.982 | 1955 | 1501 | crit_rate_7d | |
| skipped | — | — | 0 | 0 | insufficient data | |
| live | 1.000 | 0.992 | 1375 | 1195 | crit_rate_7d | |
| Signal | live | 1.000 | 0.990 | 1969 | 1490 | crit_rate_7d |
| Telegram | live | 1.000 | 0.991 | 1952 | 1436 | crit_rate_7d |
| TikTok | live | 1.000 | 0.983 | 1741 | 1567 | block_rate_7d |
| Twitter/X | live | 1.000 | 0.990 | 3292 | 1688 | crit_rate_7d |
| live | 1.000 | 0.985 | 1735 | 1330 | crit_rate_7d | |
| Wikipedia | skipped | — | — | 0 | 0 | insufficient data |
| YouTube | live | 1.000 | 0.986 | 1810 | 1338 | crit_rate_7d |
Honest caveats
- Per-platform models trained only on countries that ever observed a platform-specific block. Countries with zero history score the base rate.
- LOCO CV uses the top 8 countries per platform by positive count; thin-data platforms have wider AUC variance.
- Targets are derived from elevated/critical/warning signal_levels on platform-matching domains, not from direct block confirmations.
- Per-platform forecasts inherit strong autocorrelation: once a country actively blocks a platform, the next-week probability stays high. The model is essentially predicting 'will the current block state persist?', which it does ~85% of the time on active pairs.
- Countries with zero historical evidence on a platform score the feature-vector floor (≈base rate); treat these as low-information.
Methodology
We define 12 platforms by their canonical domains: Twitter/X (twitter.com, x.com), WhatsApp (whatsapp.com, web.whatsapp.com), Telegram (telegram.org, t.me), YouTube, Signal, Facebook, Instagram, TikTok, Reddit, Discord, Wikipedia, LinkedIn. For each platform we filter the evidence table to rows whose domain column matches one of those host patterns.
Features (17 per row) are computed at the (country, platform, date) level: trailing 7/14/30-day anomaly rate, trailing 7/14/30-day critical rate, observation count, country-wide trailing anomaly rate, plus country geography (continent, risk tier) and calendar features (day-of-week, weekend).
The target is binary: did this (country, platform) experience at least one signal_level='critical' observation in the next 7 days? We restrict scoring to country-platform pairs with at least 5 observations in the trailing 30 days to avoid trivially predicting zero for never-observed pairs.
Each platform gets its own XGBoost classifier (200 trees, max_depth 4, learning_rate 0.05). Evaluation is LOCO across the top-8 countries by positive count, plus a temporal holdout of the last 21 days. Three platforms (Discord, Wikipedia, LinkedIn) have zero critical-block observations in the current evidence window and are skipped with trained=false.
Why the high AUC is honest but not surprising. Per-platform censorship is strongly persistent: when a country has a critical block on Twitter today, it almost certainly has one next week. The model exploits this persistence aggressively, which is why LOCO and temporal AUC sit near 0.99 even though the prediction is genuinely useful (it correctly separates persistently-blocked from never-blocked).