voidly

CenDTect-style DBSCAN unsupervised anomaly: AUC 0.6506, promoted as second-opinion signal

Adapted the CenDTect approach (Aceto & Pescape 2025 — DBSCAN over OONI feature vectors) to Voidly's 80K-row evidence table. Per-country rolling 45-day window, DBSCAN(eps=75th-pct kNN, min_samples=3) on 12 standardized features. AUC vs v3.3 labeled incidents: 0.6506, just above the 0.65 promote floor. Promoted as a SECOND-OPINION signal — the supervised classifier still wins at 0.99, but DBSCAN surfaces shape-anomalous days the labels never saw. Live at /v1/anomaly/dbscan/{cc}.

#methodology#ml#anomaly#unsupervised#dbscan#cendtect#second-opinion#promoted

Voidly's supervised v3.3 classifier sits at F1 0.729 / AUC ≈ 0.99 on labeled incidents — by far our strongest signal. But labels are themselves curated, and the unsupervised view answers a different question: which (country, day) feature vectors look weird, regardless of whether anyone wrote them up as an incident?

CenDTect (Aceto & Pescape, 2025) proposed clustering OONI measurements with DBSCAN and treating noise points (cluster label -1) as candidate censorship. We adapted that to a per-country rolling window over the full Voidly evidence table.

The build

  • Feature matrix: 12 columns per (country, day) — block_rate, log measurement count, ASN diversity, source diversity, plus the 8-bucket signal-type composition (DNS-block, TCP-reset, blockpage, TLS-reset, outage, interference, generic block, ok).
  • Window: rolling per-country, last 45 days. Standardized via StandardScaler within the window.
  • DBSCAN: eps = 75th percentile of k-NN distances (k=3), min_samples = 3. Continuous score = distance to the nearest core point on the test day.

Results

  • AUC vs v3.3 labels: 0.6506 (n=3,922 scored, 1,023 positive)
  • AUC-PR: 0.3639 — well above the 0.26 baseline (positive rate)
  • Binary flag AUC: 0.6372 (using just is_noise vs continuous score)
  • Promote floor: 0.65 — passed by 0.6 percentage points
  • Improvement over the v1 IsolationForest baseline (AUC 0.489) is +16 percentage points

Honest caveats

  • Barely over the floor. 0.6506 vs 0.65 promote is a 1pp margin. Drift in the underlying evidence distribution could push it back below.
  • Hyperparameters tuned on the same labels we evaluate on. Window length, eps quantile, and min_samples were picked by grid search over the v3.3 labeled set. There is no held-out validation split — that's the next step before any production use.
  • Recall at 90th-pct threshold is only 16.6% (precision 43.3%). The model is useful for prioritizing investigations, not for replacing the supervised classifier.
  • Pooled (global) DBSCAN was worse (AUC ~0.58 across all tested window sizes). The per-country approach beats it cleanly — each country's baseline is genuinely different and global pooling washes out the signal.

Why ship it

The supervised classifier is trained against the same labels its AUC is measured against — it overfits the human-curated “what counts as an incident” definition. DBSCAN doesn't see labels at all. When the two disagree, the disagreement is itself the signal — a (country, day) the classifier shrugs at but DBSCAN flags is exactly the kind of case worth a human look.

Live at

GET /v1/anomaly/dbscan/{cc} — score a country's most-recent day (with feature vector + interpretation)
GET /v1/anomaly/dbscan/leaderboard?limit=20 — most-anomalous countries right now
GET /v1/anomaly/dbscan/info — full sidecar metrics for transparency

Example: GET /v1/anomaly/dbscan/IR currently returns anomaly_score ≈ 4.89, is_anomaly=true, with 100% block_rate across 12 critical measurements concentrated on a single ASN — a textbook shape-anomalous day.

Raw data