Every other censorship-prediction system we know of forecasts a single horizon. Per Agent 2 of our research stack, the SoTA literature (Temporal Fusion Transformer, IJ Forecasting 2021; Sun et al. spatio-temporal conformal 2024) says multi-horizon jointly modeled gives better calibration + per-horizon interpretability.
We shipped it today.
LOCO metrics — all three horizons honest
| Horizon | Positive rate | LOCO AUC | LOCO F1 | LOCO Brier |
|---|---|---|---|---|
| 1d | 3.56% | 0.9074 | 0.448 | 0.028 |
| 7d | 10.35% | 0.8828 | 0.660 | 0.055 |
| 30d | 22.40% | 0.8449 | 0.701 | 0.108 |
All three cleared their promote thresholds (1d ≥0.85, 7d baseline, 30d ≥0.75). The cross-horizon consistency check passes (LOCO AUCs follow expected ordering 1d ≥ 7d ≥ 30d as the longer horizon's label noise dominates).
Drivers DIFFER per horizon
The most interesting finding: the model learns different top features for different horizons, matching the SoTA literature.
- 1d:
block_rate_roll30_mean(0.49 importance) dominates. Very recent block-rate trend is the only signal that matters at 1-day lookahead. Operational telemetry. - 7d:
gdelt_unrest_30d(0.22) leads. Protest article volume becomes predictive. Thenblocked_count_roll14_mean(0.12). Political tension. - 30d:
gdelt_unrest_30d(0.31) +recent_shutdown(0.10) + seasonal (week_of_year,month). Repeat-risk + cyclical patterns.
Per-country LOCO breakdown
| Country | 1d AUC | 7d AUC | 30d AUC |
|---|---|---|---|
| Iran (IR) | 0.922 | 0.650 ⚠ | 0.796 |
| Venezuela (VE) | 0.977 | 0.975 | 0.924 |
| Russia (RU) | 0.813 | 0.833 | 0.807 |
| China (CN) | 0.879 | 0.947 | 0.814 |
| Egypt (EG) | 0.970 | 0.958 | 0.845 |
Venezuela + Egypt are exceptionally clean across all horizons. Iran struggles at 7d (AUC 0.65) — the 7-day window is the worst zone for IR's discrete-event shutdown pattern (elections, protests). The 1d and 30d horizons pin down IR's signal better than the 7d window does. Russia is symmetric the opposite way: weakest at 1d (false-positive prone), best at 7d.
API
GET /v1/forecast/{cc}/multi-horizon — returns 1d/7d/30d
probabilities + 90% conformal intervals + per-horizon top-5
SHAP + monotonicity consistency check.
GET /v1/forecast/multi-horizon/info — model metadata
and LOCO metrics per horizon.
The legacy /v1/forecast/{cc}/7day endpoint is
UNCHANGED. Backwards-compatible deployment.
Honest caveats
- The live runtime
build_featuresin forecast_api.py uses crude proxies (no rolling stats, lag1 ≈ incident_count/7). Per-country variation at request-time is dominated by risk_tier rather than the rolling block_rate signal that drives LOCO AUC. This was true for the legacy 7d model too — out of scope for multi-horizon ship, worth a follow-up. - 30d conformal halfwidth clamps at 0.5 because LOCO residual q90 hits 0.5+ for several countries; widening further made the upper bound meaningless. Worth refitting with a quantile regressor if calibration becomes a metric.
- IR's 7d AUC of 0.65 is the weakest single number across all our 3 horizons × 5 spotlight countries. Honest disclosure: don't cite Iran's 7-day forecast without also citing the 1d/30d.
Reproducibility
scripts/build-forecast-multi-horizon-labels.py derives
target_1day / target_7day / target_30day from the incidents table
with monotonicity verification.
scripts/train-forecast-multi-horizon.py trains
three independent XGBoost + isotonic models. LOCO predictions
persisted for conformal interval estimation.
scripts/patch-multi-horizon-endpoint.py idempotent
Flask route patcher with ast-validate-and-rollback safety.