Atlas · topic modeling

Incident topics

8 themes auto-discovered across 1,195 deduped incident descriptions (2,636 source rows, 55% boilerplate dedupe). NPMI coherence 0.726 — well above the 0.30 promote floor.

Honest caveat: NMF is exploratory and topic labels are auto-generated heuristics over the top words, not editorial. Pick a label, scan the sample incidents, decide for yourself if the grouping holds. BERTopic with semantic embeddings would likely cluster differently — see methodology below.

Raw JSON · /topics/info · model: tfidf_nmf_v1 · trained May 21, 2026

Topic cards

Connectivity disruption (IODA)

765 docs · 64.0%

Top words

connectivitydisruption connectivitydrop alertsconnectivity disruptionconnectivity dropdroprecordedalerts recordedalertsdisruption

Sample incidents

VE-2026-0215(VE) Internet connectivity disruption in Venezuela
FI-2026-0005(FI) Internet connectivity disruption in Finland
IQ-2026-0119(IQ) Internet connectivity disruption in Iraq
BJ-2026-0015(BJ) Internet connectivity disruption in Benin

Top countries

VE 20 IR 18 ID 18 TT 17 IN 14 NG 14 MX 13 SI 12

Connectivity disruption (IODA)

157 docs · 13.1%

Top words

network disruptionnetworkdrop alertsalertsconnectivity droprecordeddisruptiondropalerts recordedconnectivity

Sample incidents

CK-2026-0001(CK) Network disruption detected in CK
FM-2026-0001(FM) Network disruption detected in FM
VC-2026-0001(VC) Network disruption detected in VC
TW-2026-0001(TW) Network disruption detected in TW

Top countries

SN 2 HK 2 PH 2 KG 2 QA 2 CM 2 BF 2 TG 2

Social media platform blocks

51 docs · 4.3%

Top words

anomalousblocking probesprobes anomalousdnsdns blockingprobesblockingcomanomalous tiktoktiktok com

Sample incidents

MA-2026-0094(MA) Internet censorship detected in Morocco
KH-2026-0014(KH) Internet censorship detected in Cambodia
JO-2026-0101(JO) Internet censorship detected in Jordan
UZ-2026-0074(UZ) Internet censorship detected in Uzbekistan

Top countries

UZ 6 JO 5 NI 4 SY 4 AE 3 NG 2 QA 2 DZ 2

Sustained activity / repeated alerts

53 docs · 4.4%

Top words

confirms activityconfirmsactivityrecorded confirmsactivity confirmsdisruption connectivityganptmma

Sample incidents

JO-2026-0083(JO) Internet connectivity disruption in Jordan
AZ-2026-0132(AZ) Internet connectivity disruption in Azerbaijan
EG-2026-0167(EG) Internet connectivity disruption in Egypt
PK-2026-0157(PK) Internet connectivity disruption in Pakistan

Top countries

AZ 3 PK 3 TZ 3 EG 2 UZ 2 IQ 2 SY 2 IR 2

HTTP/TLS probe timeouts

48 docs · 4.0%

Top words

blockedprobes blockedprobesblocking timeouttimeout probestimeoutblockinghttps blockinghttpshttp

Sample incidents

TZ-2026-0048(TZ) Internet censorship detected in Tanzania
TZ-2026-0046(TZ) Internet censorship detected in Tanzania
TZ-2026-0036(TZ) Internet censorship detected in Tanzania
TZ-2026-0034(TZ) Internet censorship detected in Tanzania

Top countries

TZ 9 PK 6 AZ 4 BD 4 RU 4 BY 4 MM 4 RW 2

Connectivity disruption (IODA)

15 docs · 1.3%

Top words

uniteddrop unitedunited connectivitydisruption unitedunited alertsconnectivityconnectivity disruptionconnectivity dropalertsdisruption

Sample incidents

GB-2026-0039(GB) Internet connectivity disruption in United Kingdom
US-2026-0057(US) Internet connectivity disruption in United States
GB-2026-0037(GB) Internet connectivity disruption in United Kingdom
GB-2026-0033(GB) Internet connectivity disruption in United Kingdom

Top countries

GB 8 US 5 GA 1 SS 1

Social media platform blocks

40 docs · 3.3%

Top words

comblockingdnsdns blockingblocking domainscensoredplanetdomainsinstagramcom censoredplanetfacebook

Sample incidents

BY-2026-0004(BY) DNS blocking in BY: bbc.com, facebook.com, google.com, instagram.com, medium.com +13 more
EG-2026-0005(EG) DNS blocking in EG: bbc.com, facebook.com, google.com, instagram.com, medium.com +13 more
SA-2026-0004(SA) DNS blocking in SA: bbc.com, facebook.com, google.com, instagram.com, medium.com +13 more
TR-2026-0006(TR) DNS blocking in TR: bbc.com, facebook.com, google.com, instagram.com, medium.com +13 more

Top countries

PK 5 CN 4 MM 3 EG 3 IR 3 TR 3 AE 2 KW 2

OONI anomaly burst

47 docs · 3.9%

Top words

oonianomaly raterateanomalyshutdownsustainednetwork averagingaveragingelevated networkaveraging anomaly

Sample incidents

RU-2026-0216(RU) Sustained censorship in RU (2026-05)
RU-2026-0010(RU) Sustained censorship in RU (2026-01)
RU-2023-0006(RU) Sustained censorship in RU (2023-12)
RU-2023-0004(RU) Sustained censorship in RU (2023-01)

Top countries

RU 9 IR 8 KZ 3 PK 3 IN 3 GB 2 VE 2 SD 2

Unlabeled (low signal)

19 docs · 1.6%

Country × topic heatmap

Top 40 countries by incident volume (min 4 incidents in the corpus). Cell shade = share of that country's incidents in that topic (deeper green = larger share). Helps a journalist see which themes dominate a specific country's history.

Country	t0 · Connectivity disruptio	t1 · Connectivity disruptio	t2 · Social media platform	t3 · Sustained activity / r	t4 · HTTP/TLS probe timeout	t5 · Connectivity disruptio	t6 · Social media platform	t7 · OONI anomaly burst	n
IR	18		1	2			3	8	32
VE	20			1	1		1	2	25
RU	6			1	4		2	9	23
PK	3			3	6		5	3	21
EG	9		1	2	2		3	1	19
ID	18			1					19
IN	14			1				3	19
NG	14		2	1	1				18
TT	17	1							18
MM	6	1			4		3	2	17
TZ	3		1	3	9			1	17
IQ	10		1	2			2		16
KZ	7		1	1	2		1	3	15
MX	13	1							14
SY	8		4	2					14
UZ	3		6	2	2		1		14
AZ	6			3	4				13
BD	7			1	4		1		13
ET	9	1	1					2	13
SI	12	1							13
TR	7			2			3	1	13
CM	10	2							12
CN	5	1					4	1	11
CU	6		2	1			1	1	11
FR	10	1							11
GB				1		8		2	11
PA	10	1							11
BA	9	1							10
BY	2		1	1	4		1	1	10
MZ	9	1							10
NI	5	1	4						10
SA	4		2	2			1		10
UA	9			1					10
VN	4		1	2			1		10
ZW	9	1							10
AO	8	1							9
BR	7	1			1				9
CO	8	1							9
DZ	5	1	2					1	9
HN	8	1							9

Methodology

Corpus build: pull every incident (title + description) from the live DB, normalize whitespace, dedupe near-duplicates by SHA-256 of the lowercased text. 2,636 rows → 1,195 unique docs.
Vectorize: tf-idf, unigrams + bigrams, min_df=2, max_df=0.95, vocab capped at 5,000, English stopwords + domain stopwords (boilerplate like "cenalert", "detected", country names) ⇒ 623 terms.
NMF sweep: fit sklearn NMF with k ∈ [8, 24] step 2, init="nndsvda", beta_loss=frobenius, score by NPMI coherence on top-12 words per topic. Pick the k that maximizes coherence.
Assignment: each doc → argmax topic (hard assignment). Documents with W weight below 1e-6 fall into an "unlabeled" bucket.
Labels: auto-generated by checking the top-N words against a hand-curated keyword map (election / VPN / DNS / etc). Falls back to "Topic N: word1 / word2" if no keywords match. These are heuristic — they exist to make scanning faster, not to be authoritative.

k sweep

Coherence by k (NPMI on training corpus, top-12 words per topic):

k	coherence	recon err
8	0.7257	19.81
10	0.7071	19.35
12	0.6992	18.95
14	0.7149	18.61
16	0.7002	18.33
18	0.7014	18.02
20	0.6864	17.72
22	0.6943	17.43
24	0.6793	17.15

Honest caveats

NMF topics are exploratory. Labels are auto-generated heuristics over the top words, not editorial.
Many IODA disruption incidents share boilerplate ("CenAlert detected interference"). Dedupe removes near-duplicates by SHA-256 of normalized text, but a residual disruption topic is expected.
Topic assignment is hard (single argmax). An incident with mixed signals (e.g. election + DNS) only counts toward one topic.
Coherence is approximated via pairwise NPMI on the training corpus, not full c_v from Röder et al. Reasonable proxy but not directly comparable to BERTopic numbers in the literature.
Sentence-transformers + BERTopic was not installable in the production venv-ml — fell back to tf-idf+NMF per the project directive. BERTopic with all-MiniLM-L6-v2 would likely surface more semantic clusters; tf-idf catches the lexical surface.

/atlas — main hub
/atlas/cohorts — DTW + Ward country cohorts (similar shapes, time-shift aware)
/atlas/anomaly — per-country DBSCAN second-opinion anomalies
/atlas/changelog — full ML model history
/censorship-index — full incident list

Country	t0 · Connectivity disruptio	t1 · Connectivity disruptio	t2 · Social media platform	t3 · Sustained activity / r	t4 · HTTP/TLS probe timeout	t5 · Connectivity disruptio	t6 · Social media platform	t7 · OONI anomaly burst	n
IR	18		1	2			3	8	32
VE	20			1	1		1	2	25
RU	6			1	4		2	9	23
PK	3			3	6		5	3	21
EG	9		1	2	2		3	1	19
ID	18			1					19
IN	14			1				3	19
NG	14		2	1	1				18
TT	17	1							18
MM	6	1			4		3	2	17
TZ	3		1	3	9			1	17
IQ	10		1	2			2		16
KZ	7		1	1	2		1	3	15
MX	13	1							14
SY	8		4	2					14
UZ	3		6	2	2		1		14
AZ	6			3	4				13
BD	7			1	4		1		13
ET	9	1	1					2	13
SI	12	1							13
TR	7			2			3	1	13
CM	10	2							12
CN	5	1					4	1	11
CU	6		2	1			1	1	11
FR	10	1							11
GB				1		8		2	11
PA	10	1							11
BA	9	1							10
BY	2		1	1	4		1	1	10
MZ	9	1							10
NI	5	1	4						10
SA	4		2	2			1		10
UA	9			1					10
VN	4		1	2			1		10
ZW	9	1							10
AO	8	1							9
BR	7	1			1				9
CO	8	1							9
DZ	5	1	2					1	9
HN	8	1							9

Country	t0 · Connectivity disruptio	t1 · Connectivity disruptio	t2 · Social media platform	t3 · Sustained activity / r	t4 · HTTP/TLS probe timeout	t5 · Connectivity disruptio	t6 · Social media platform	t7 · OONI anomaly burst	n
IR	18		1	2			3	8	32
VE	20			1	1		1	2	25
RU	6			1	4		2	9	23
PK	3			3	6		5	3	21
EG	9		1	2	2		3	1	19
ID	18			1					19
IN	14			1				3	19
NG	14		2	1	1				18
TT	17	1							18
MM	6	1			4		3	2	17
TZ	3		1	3	9			1	17
IQ	10		1	2			2		16
KZ	7		1	1	2		1	3	15
MX	13	1							14
SY	8		4	2					14
UZ	3		6	2	2		1		14
AZ	6			3	4				13
BD	7			1	4		1		13
ET	9	1	1					2	13
SI	12	1							13
TR	7			2			3	1	13
CM	10	2							12
CN	5	1					4	1	11
CU	6		2	1			1	1	11
FR	10	1							11
GB				1		8		2	11
PA	10	1							11
BA	9	1							10
BY	2		1	1	4		1	1	10
MZ	9	1							10
NI	5	1	4						10
SA	4		2	2			1		10
UA	9			1					10
VN	4		1	2			1		10
ZW	9	1							10
AO	8	1							9
BR	7	1			1				9
CO	8	1							9
DZ	5	1	2					1	9
HN	8	1							9

Topic cards

Country × topic heatmap

Methodology

k sweep

Honest caveats

Related

Country	t0 · Connectivity disruptio	t1 · Connectivity disruptio	t2 · Social media platform	t3 · Sustained activity / r	t4 · HTTP/TLS probe timeout	t5 · Connectivity disruptio	t6 · Social media platform	t7 · OONI anomaly burst	n
IR	18		1	2			3	8	32
VE	20			1	1		1	2	25
RU	6			1	4		2	9	23
PK	3			3	6		5	3	21
EG	9		1	2	2		3	1	19
ID	18			1					19
IN	14			1				3	19
NG	14		2	1	1				18
TT	17	1							18
MM	6	1			4		3	2	17
TZ	3		1	3	9			1	17
IQ	10		1	2			2		16
KZ	7		1	1	2		1	3	15
MX	13	1							14
SY	8		4	2					14
UZ	3		6	2	2		1		14
AZ	6			3	4				13
BD	7			1	4		1		13
ET	9	1	1					2	13
SI	12	1							13
TR	7			2			3	1	13
CM	10	2							12
CN	5	1					4	1	11
CU	6		2	1			1	1	11
FR	10	1							11
GB				1		8		2	11
PA	10	1							11
BA	9	1							10
BY	2		1	1	4		1	1	10
MZ	9	1							10
NI	5	1	4						10
SA	4		2	2			1		10
UA	9			1					10
VN	4		1	2			1		10
ZW	9	1							10
AO	8	1							9
BR	7	1			1				9
CO	8	1							9
DZ	5	1	2					1	9
HN	8	1							9