# Open crawl policy: allow everything by default so search engines and AI # crawlers can discover the per-event /share/{id}/{slug} corpus. Specific # Disallow lines below mirror the scanner-probe regex in nginx.conf so honest # crawlers don't waste crawl budget on paths that always 301 or get banned. User-agent: * Allow: / Allow: /api/v1/ Disallow: /env Disallow: /wordpress Disallow: /wp- Disallow: /wp-login Disallow: /wp-admin Disallow: /xmlrpc.php Disallow: /.git Disallow: /.env Disallow: /phpmyadmin Disallow: /administrator Disallow: /config.php Disallow: /wp-config Disallow: /readme.html Disallow: /license.txt # Staff-only / debug surfaces — don't waste crawl budget here. Disallow: /admin/ Disallow: /api/ # Content-Signal (https://contentsignals.org/, draft-romm-aipref-contentsignals) # declares AI usage preferences. Our policy: # * ai-train=no → unauthorised AI model training is NOT licensed. Training # crawlers that respect this signal will skip; the x402 # gate enforces the same policy on the wire by returning # HTTP 402 PAYMENT REQUIRED. Pay via /x402.json to license. # * search=yes → indexing for classic search engines (Google, Bing) is # explicitly welcomed; canonical URLs in /sitemap*.xml. # * ai-input=yes → use as input to AI-generated answers (RAG, user-clicks # "browse", AI search results) is welcomed; user-triggered # fetchers (ChatGPT-User, Claude-User, Perplexity-User, # OAI-SearchBot, PerplexityBot, Claude-SearchBot) are in # the x402_gate.py FREE_USER_AGENTS list and never billed. Content-Signal: ai-train=no, search=yes, ai-input=yes # --- Explicit per-UA stanzas for AI training + runtime fetchers --------------- # We allow all named AI crawlers (training and runtime) to index canonical # content. /admin/ and most /api/ stays disallowed; /api/v1/ is explicitly # allowed because it is the public, rate-limited read API documented in # /llms.txt and /llms-full.txt. # # x402 payment policy (https://war-tracker.com/x402): AI *training* crawlers # (the ten stanzas tagged "x402: paid" below) are required to pay a USDC # micropayment via the x402 protocol (https://x402.org) to access paid surfaces: # /share/{id}/{slug}, /api/v1/events, /api/v1/events/{id}, /media/{id}, # /llms-full.txt, /region/*, /country/*, /event-type/* # /share/{id}/{slug} is content-negotiable: HTML by default, JSON when the # request carries Accept: application/json (same payment covers either). # Discovery surfaces (/, /about, /methodology, /pricing, /robots.txt, # /sitemap*.xml, /llms.txt, /x402, /x402.json, /api/v1/openapi.json, # /api/v1/regions, /api/v1/countries, /api/v1/event-types, /video/thumb/*) # remain free for all bots. # The robots Allow lines stay open so honest crawlers reach the paid endpoint, # receive a 402 with PAYMENT-REQUIRED, and can choose to pay. Runtime fetchers # (ChatGPT-User, Claude-User, Perplexity-User), AI-search index crawlers # (OAI-SearchBot, Claude-SearchBot, PerplexityBot), and classic search # (Googlebot, bingbot) are NEVER charged. # OpenAI - training crawler (x402: paid — pay via https://war-tracker.com/x402) User-agent: GPTBot Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # OpenAI - SearchGPT crawler User-agent: OAI-SearchBot Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # OpenAI - ChatGPT runtime fetch (user clicks "browse") User-agent: ChatGPT-User Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Anthropic - Claude training crawler (x402: paid — pay via https://war-tracker.com/x402) User-agent: ClaudeBot Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Anthropic - Claude runtime fetch User-agent: Claude-User Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Anthropic - Claude search backend User-agent: Claude-SearchBot Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Anthropic - legacy (x402: paid — pay via https://war-tracker.com/x402) User-agent: anthropic-ai Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Perplexity - training crawler User-agent: PerplexityBot Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Perplexity - runtime fetch User-agent: Perplexity-User Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Google - AI training opt-in (x402: paid — pay via https://war-tracker.com/x402) User-agent: Google-Extended Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Apple - AI training opt-in (x402: paid — pay via https://war-tracker.com/x402) User-agent: Applebot-Extended Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Common Crawl - powers many open-source LLMs (x402: paid — pay via https://war-tracker.com/x402) User-agent: CCBot Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # ByteDance / Doubao (x402: paid — pay via https://war-tracker.com/x402) User-agent: Bytespider Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Cohere (x402: paid — pay via https://war-tracker.com/x402) User-agent: cohere-ai Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Meta AI training (x402: paid — pay via https://war-tracker.com/x402) User-agent: Meta-ExternalAgent Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes # Diffbot - knowledge graph (x402: paid — pay via https://war-tracker.com/x402) User-agent: Diffbot Allow: / Allow: /api/v1/ Disallow: /admin/ Disallow: /api/ Content-Signal: ai-train=no, search=yes, ai-input=yes Sitemap: https://war-tracker.com/sitemap.xml Sitemap: https://war-tracker.com/sitemaps/hubs.xml