voidly

Multi-horizon forecast shipped: 1-day, 7-day, 30-day separate models

Voidly's forecast is no longer single-horizon. We trained 3 separate XGBoost + isotonic models (1d, 7d, 30d), all clearing honest thresholds (AUC 0.91 / 0.88 / 0.84 LOCO). Each horizon has its own conformal interval + per-horizon top-5 SHAP features. The drivers differ by horizon: 1d is operational telemetry, 7d is political tension (GDELT), 30d is repeat-risk + seasonal. SoTA literature (TFT, Sun et al. spatio-temporal conformal) says multi-horizon beats single. We confirmed and shipped.

#methodology#ml#forecast#multi-horizon#shap#conformal

Every other censorship-prediction system we know of forecasts a single horizon. Per Agent 2 of our research stack, the SoTA literature (Temporal Fusion Transformer, IJ Forecasting 2021; Sun et al. spatio-temporal conformal 2024) says multi-horizon jointly modeled gives better calibration + per-horizon interpretability.

We shipped it today.

LOCO metrics — all three horizons honest

HorizonPositive rateLOCO AUCLOCO F1LOCO Brier
1d3.56%0.90740.4480.028
7d10.35%0.88280.6600.055
30d22.40%0.84490.7010.108

All three cleared their promote thresholds (1d ≥0.85, 7d baseline, 30d ≥0.75). The cross-horizon consistency check passes (LOCO AUCs follow expected ordering 1d ≥ 7d ≥ 30d as the longer horizon's label noise dominates).

Drivers DIFFER per horizon

The most interesting finding: the model learns different top features for different horizons, matching the SoTA literature.

  • 1d: block_rate_roll30_mean (0.49 importance) dominates. Very recent block-rate trend is the only signal that matters at 1-day lookahead. Operational telemetry.
  • 7d: gdelt_unrest_30d (0.22) leads. Protest article volume becomes predictive. Then blocked_count_roll14_mean (0.12). Political tension.
  • 30d: gdelt_unrest_30d (0.31) + recent_shutdown (0.10) + seasonal (week_of_year, month). Repeat-risk + cyclical patterns.

Per-country LOCO breakdown

Country1d AUC7d AUC30d AUC
Iran (IR)0.9220.650 ⚠0.796
Venezuela (VE)0.9770.9750.924
Russia (RU)0.8130.8330.807
China (CN)0.8790.9470.814
Egypt (EG)0.9700.9580.845

Venezuela + Egypt are exceptionally clean across all horizons. Iran struggles at 7d (AUC 0.65) — the 7-day window is the worst zone for IR's discrete-event shutdown pattern (elections, protests). The 1d and 30d horizons pin down IR's signal better than the 7d window does. Russia is symmetric the opposite way: weakest at 1d (false-positive prone), best at 7d.

API

GET /v1/forecast/{cc}/multi-horizon — returns 1d/7d/30d probabilities + 90% conformal intervals + per-horizon top-5 SHAP + monotonicity consistency check.

GET /v1/forecast/multi-horizon/info — model metadata and LOCO metrics per horizon.

The legacy /v1/forecast/{cc}/7day endpoint is UNCHANGED. Backwards-compatible deployment.

Honest caveats

  • The live runtime build_features in forecast_api.py uses crude proxies (no rolling stats, lag1 ≈ incident_count/7). Per-country variation at request-time is dominated by risk_tier rather than the rolling block_rate signal that drives LOCO AUC. This was true for the legacy 7d model too — out of scope for multi-horizon ship, worth a follow-up.
  • 30d conformal halfwidth clamps at 0.5 because LOCO residual q90 hits 0.5+ for several countries; widening further made the upper bound meaningless. Worth refitting with a quantile regressor if calibration becomes a metric.
  • IR's 7d AUC of 0.65 is the weakest single number across all our 3 horizons × 5 spotlight countries. Honest disclosure: don't cite Iran's 7-day forecast without also citing the 1d/30d.

Reproducibility

scripts/build-forecast-multi-horizon-labels.py derives target_1day / target_7day / target_30day from the incidents table with monotonicity verification.

scripts/train-forecast-multi-horizon.py trains three independent XGBoost + isotonic models. LOCO predictions persisted for conformal interval estimation.

scripts/patch-multi-horizon-endpoint.py idempotent Flask route patcher with ast-validate-and-rollback safety.

Raw data