voidly

Per-ASN forecasting: not viable today. The probe network needs 5× more ASN coverage first.

We prototyped per-ASN granular forecasting (one model per ISP/AS) per Saha et al. WebSci 2025. Of 168 ASN-tagged ASs in our evidence corpus, only 6 had ≥30 measurement days — and only 1 had enough class variance to train. The data isn't there yet. Filing this as a probe-network expansion priority instead.

#methodology#ml#forecast#per-asn#data-density#honest-not-yet

Saha et al. WebSci 2025 modeled 150 Russian ASNs individually, finding TSPU-driven RTT acceleration synchronized with policy events. That's the kind of granular forecasting Voidly wants.

We have ASN-tagged evidence (16,712 rows across 168 ASNs in 42 countries). Prototype training pipeline + endpoint built. Honest result: not viable today.

Three blockers

  1. Source monoculture: 100% of our ASN-tagged evidence is from CensoredPlanet. CP probes known blocked domains, so positive-class dominates and there's no "normal day" signal to contrast.
  2. Sparse irregular sampling: median ~17 measurement days per ASN over 106 calendar days. Sliding 7-day windows leave 10-20 training rows per ASN — below any reasonable threshold for binary classification.
  3. No ASN-resolved outage labels: country-level forecast uses incidents as labels; we have no ASN-resolved outage incidents.

Density audit

ThresholdASNs qualifying
≥100 rows52
≥100 rows AND ≥30 measurement days6
≥100 rows AND ≥20 measurement days24

The 6 "tier-1" ASNs: SA AS8895, CN AS146812, ID AS135473, IQ AS215597 (EarthLink), RU AS47541 (ER-Telecom), RU AS43727.

Training result

Of the 6 tier-1 ASNs, only 1 trained (RU AS47541) — the rest had single-class folds. The trained model got AUC=1.0 on n_test=6, which is statistically meaningless.

0 ASNs forecast reliably today.

What we'd need to make this work

  1. Probe network ASN coverage: Voidly's own probes must record probe AS#; right now only CensoredPlanet supplies it. Target: ≥10 distinct ASNs per priority country with daily coverage. ~5× current row count (80K+ ASN-tagged rows).
  2. ASN-resolved incident labels: extend create-voidly-incidents.py to emit ASN field when an evidence cluster is single-ASN. Without ASN ground truth there's no honest outage target.
  3. Negative-class enrichment: pull OONI measurements (already country-tagged) and back-join probe ASN at measurement time to give the classifier "normal" days.
  4. Re-evaluate at ~6 months of clean data. Saha et al. used years of MetricsLab pings, not 3 months of CP bursts.

What we left in place

Prototype runs as a SEPARATE Flask app on port 5012 (NOT patched into the production api_v3 — exploratory work doesn't leak). Endpoints return experimental: true and a clear disclaimer about per-ASN forecast unreliability.

Files: scripts/build-per-asn-forecast-dataset.py, scripts/train-per-asn-forecast.py, scripts/patch-per-asn-endpoint.py.

Filed as roadmap

"Expand probe network ASN coverage" is now a probe-network priority. Revisit per-ASN forecasting in Q4 2026 once ASN-tagged rows hit 80K+ from voidly-owned probes (not just CP imports).

Raw data