Per r/coolgithubprojects feedback from @scarletdawnredd: https://www.reddit.com/r/coolgithubprojects/comments/1tjjohw/comment/on2kjck/
Real coverage gap: Cloudflare / Vercel / AWS WAF can block AI bot UAs at the edge before robots.txt is read. citable today fetches robots.txt cleanly and the AI bot allowlist check passes, but in practice AI crawlers may get a 403 from the firewall and never reach the page.
Proposed v0.3.0 check (C-26 Firewall AI Bot Block):
- For each of the 6 AI bots (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, Applebot-Extended), make a HEAD request to the homepage with that UA
- If response is 403, 401, or has Cloudflare / AWS-WAF challenge headers (cf-mitigated, server: cloudflare with challenge response), flag as FAIL
- Severity: P0 (this is an actual technical blocker, unlike C-02)
- Include the blocking response status + headers in evidence_snippet so users can debug
Pair this with the soft-404 detection check (separate issue) since both are about catching responses that LOOK fine but aren't.
Per r/coolgithubprojects feedback from @scarletdawnredd: https://www.reddit.com/r/coolgithubprojects/comments/1tjjohw/comment/on2kjck/
Real coverage gap: Cloudflare / Vercel / AWS WAF can block AI bot UAs at the edge before robots.txt is read. citable today fetches robots.txt cleanly and the AI bot allowlist check passes, but in practice AI crawlers may get a 403 from the firewall and never reach the page.
Proposed v0.3.0 check (C-26 Firewall AI Bot Block):
Pair this with the soft-404 detection check (separate issue) since both are about catching responses that LOOK fine but aren't.