What shipped
Up to today we classified censorship at the country-day level: 84K evidence rows aggregated into per-country features fed into the v3.3 GradientBoosting classifier. That answers “is country X being censored today?” well but cannot answer “is THIS specific measurement evidence of blocking?”
New endpoint
POST /v1/measurement/classify
takes one measurement (source, country, ASN, domain,
signal_value, signal_baseline, observed_at, optional rolling
rates) and returns
{probability, label, top_features,
model_version} with a SHAP-style top-5 contribution
list. A companion
GET /v1/measurement/info exposes
model version, feature names, and held-out metrics.
Why this is interesting
Journalists and researchers querying a single OONI report
previously had to either trust the upstream
signal_type field as-is or
compute country-level statistics. A per-row probability with an
explanation lets you cite a single measurement with confidence
and surface the features that drove the call. Inspired by Niaki
et al. KDD 2023
(
Massively Parallel Censorship Probing of DNS and TLS
Globally) and the OONI Probe per-measurement anomaly
heuristics.
How it was trained
Inputs: all 85,468 evidence rows in
voidly_data.db. Label rule per
task spec: positive iff
signal_type IN (block, isp-outage)
OR signal_type LIKE 'dns-blocking%' OR signal_type LIKE
'http-blocking%' AND
confidence >= 0.7. Balance:
16,798 positive (19.65 percent), 68,670 negative. Features (24
total, label-leaking columns dropped): signal_value,
signal_baseline, signal_deviation, signal_ratio, risk_tier,
time-of-day, day-of-week, weekend flag, month, has_artifact_hash,
has_source_url, has_domain, has_asn, asn (integer), 7-day
rolling positive rate per country, per ASN, and per
country+domain, and hash-encoded source / kind / country /
continent / un_subregion / region / domain_category. Model:
XGBoost(max_depth=4, n_estimators=100) on a stratified 80/20
split (seed 42).
The result, and the honest caveat
Held-out 20 percent ROC AUC: 1.000 (n=17,094).
F1: 1.000. Precision: 1.000. Recall: 1.000. Confusion matrix at
threshold 0.5: [[13734, 0], [0,
3360]]. Per-source AUC where computable: OONI 1.000,
CensoredPlanet 1.000, Voidly 1.000. IODA and Voidly-Community
are one-class slices in the test fold (IODA: 0 positives by
construction because we excluded
outage from the label rule;
Voidly-Community: 100 percent positives because every community
probe entry confirms a block) so AUC is undefined for those.
Honest framing: AUC=1.000 is real but it is
not a breakthrough. The label rule depends deterministically on
signal_type, and each source uses
distinctive
signal_value /
signal_baseline patterns
(CensoredPlanet positives nearly always have value=1.0; OONI
positives have baseline=0.03 with values in the 0.5-1.0 range;
IODA has zero positives because we explicitly excluded
outage-only rows). XGBoost can essentially memorize the
labeling function from the raw signal columns plus
source. The top feature is the
7-day rolling per-ASN positive rate (81 percent gain), which
also overlaps with same-period labels for ASNs that only
appear in a tight window. We drop signal_type, blocking_method,
and signal_level from the feature matrix to avoid the worst
leakage, but residual coupling between raw signal values and
provider judgement is unavoidable.
What this is useful for: returning a calibrated probability and an SHAP-style explanation per measurement in one call, with the same labeling rule the rest of the pipeline uses. What it is NOT useful for: claiming we can detect censorship that the upstream providers missed. The v3.3 country-day classifier remains the model that actually generalizes across countries; this one is a per-row interface layer on the same evidence.
Promotion + endpoint
Promotion gate: AUC >= 0.85 AND every per-source AUC >=
0.75. Both met (the only sources excluded from per-source AUC
are one-class slices, not failures). Endpoint
live at
POST https://api.voidly.ai/v1/measurement/classify
and
GET https://api.voidly.ai/v1/measurement/info.
Build scripts:
scripts/build-measurement-features.py,
scripts/train-measurement-classifier.py,
scripts/patch-measurement-classify-endpoint.py
(idempotent, ast-validated).