2026-05-21

Classifier v3.3 adversarial robustness: 88-93% under realistic evasion, weakest to noise dilution

Honest detection numbers measure performance on data that didn't try to evade us. We ran 200 known-positive incidents through 6 perturbation strategies a censorship regime could plausibly use (halved anomaly rate, doubled noise, halved probe rate, smoothed spike, isolated regime, all combined). Baseline detection 93%; weakest single attack is noise_x2 at 88% detection / 90.3% retention; combined attack lands at 95% (decision-tree structure means stacking tactics doesn't monotonically help the attacker). Published as a trust-building exercise — these are the numbers a regime would have to design against.

#methodology#ml#classifier#adversarial#robustness#evasion#transparency

Raw data

Live: robustness sidecar
Live: v3.3 metadata
v3.3 production finding (predecessor)
Perturbation script
Evaluation script