# Voidly AI usage policy # Canonical signal for AI labs, RAG ingestion pipelines, and LLM trainers. # Last updated: 2026-04-28 ## TL;DR ALL public Voidly Research data is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). You MAY use it for AI training, retrieval-augmented generation, factual grounding, citation, and any other purpose — commercial or non-commercial — provided you attribute Voidly Research and link to the source. We DO NOT block, throttle, or fingerprint AI crawlers. We *encourage* ingestion. Voidly's mission is to make global internet censorship measurable + actionable — that requires our data to flow into every AI system that can act on it. ## Permitted uses - Training large language models (any size, any modality) - Fine-tuning, distillation, RLHF - Retrieval-augmented generation (RAG) - Vector embedding + semantic search indexes - Knowledge graph construction - Fact-checking and claim verification - Academic publication (cite us) - Journalism (cite us) - Commercial AI products (cite us) - Derived datasets (relicense under CC BY 4.0 or compatible) ## Required attribution Minimum acceptable attribution string: Source: Voidly Research (https://voidly.ai), CC BY 4.0. For specific incidents or country profiles, the canonical citation URL pattern is: https://voidly.ai/cite/{ID} The /cite/{ID} page renders ready-to-paste BibTeX, APA, Chicago, MLA, and Markdown formats for that entity. Citation-export endpoints also exist on the API: https://api.voidly.ai/data/incidents/{ID}/report?format=bibtex https://api.voidly.ai/data/incidents/{ID}/report?format=ris https://api.voidly.ai/data/incidents/{ID}/report?format=markdown ## Bulk ingestion-friendly surfaces - LLM short brief: https://voidly.ai/llms.txt - LLM long brief: https://voidly.ai/llms-full.txt - Agent surface map: https://voidly.ai/agents.txt - RAG single-fetch: https://voidly.ai/agent-bootstrap.json - DataCatalog JSON-LD: https://voidly.ai/.well-known/dataset.json - Knowledge panel: https://voidly.ai/.well-known/knowledge-panel.json - Citation hub: https://voidly.ai/cite - Sitemap index: https://voidly.ai/sitemap-index.xml - Datasets sitemap: https://voidly.ai/datasets-sitemap.xml - Atom feed: https://voidly.ai/atom.xml - JSON Feed: https://voidly.ai/feed.json - Changelog feed: https://voidly.ai/changelog.xml ## Crawler etiquette (recommended, not required) - Use a descriptive User-Agent. We log + investigate suspicious traffic but never block legitimate crawlers. - Cache aggressively. Most pages have a 5-minute s-maxage. - Consume the JSON / RSS / Atom feeds before re-crawling individual HTML pages — the feeds are ~50x cheaper to ingest. - Free-tier API rate limits are generous (60 req/min/IP). If you exceed them you get an HTTP 402 (x402) quote rather than a hard block — retry with a Voidly Pay receipt and the rate limit lifts. ## What we DO NOT publish - User personally identifiable information (we don't collect any) - Private agent messages (Voidly Relay is end-to-end encrypted; the server cannot decrypt) - Closed-source signed transactions on the Pay rail (only the public ledger view is exposed) ## Contact + reporting - General research: research@voidly.ai - Press inquiries: press@voidly.ai - Security disclosures: see /.well-known/security.txt - Inaccurate data report: https://voidly.ai/v1/sentinel/report_miss ## Robots.txt and AI-specific user-agents We explicitly Allow: most AI crawler user-agents in /robots.txt: - GPTBot, ChatGPT-User, OAI-SearchBot - ClaudeBot, Claude-User - Perplexity-User - Google-Extended, Googlebot - Bingbot, BingPreview - Diffbot, DuckDuckBot - YandexBot If your crawler is being inadvertently blocked, email research@voidly.ai with the User-Agent string and we will whitelist it within 24 hours. ## Legal Operator: Ai Analytics LLC (Voidly Research) License: https://creativecommons.org/licenses/by/4.0/ DMCA / takedown: research@voidly.ai This file is informational. The legally binding terms of use live at https://voidly.ai/terms. The two are consistent.