Why our category counts mirror the test list — a methodology honesty note
Voidly's category breakdowns ('Iran blocks 110 news sites', 'Russia leads on human-rights blocking') are real but answer a narrower question than they appear to: they describe what's blocked AMONG THE SITES TESTED, and the tested sites are a deliberately curated, politically-weighted list. The tell is in what's accessible: if measurement sampled the whole web, the confirmed-ACCESSIBLE set would look like the web (shopping, banking, health). It doesn't — in Iran the accessible domains are themselves dominated by news (94) and human rights (57), with commerce/health/government barely present (1-2 each). The accessible set and the blocked set are the same KIND of site because both are drawn from the same list. Iran isn't conspicuously leaving banking open; banking simply isn't measured. Why: Voidly aggregates OONI, which tests the Citizen Lab test lists — URLs SELECTED for their likelihood of being censored (news, human rights, political, LGBTQ+, religion, circumvention, gambling, adult). So the corpus's category mix is set partly by the list authors, not only the censor; 'NEWS is the most-blocked category' partly reflects that news is among the most heavily TESTED. This does NOT make any finding wrong — a domain confirmed blocked across >=3 networks is genuinely blocked. It changes the DENOMINATOR: read 'Iran blocks 110 news domains' as 'of the news domains on the test list, Iran blocks 110' — a lens, not a census. Cross-country comparisons stay fair (same global list); absolute 'fraction of the web' claims are what the data can't support. Same honesty as the confirmed-block-map-is-a-measurement-map finding. Live: /v1/measurement/category-leaders, /v1/measurement/intent-profile.