Add firewall blocking detection (fetch with AI bot UAs, flag 403/blocked)

Per r/coolgithubprojects feedback from @scarletdawnredd: https://www.reddit.com/r/coolgithubprojects/comments/1tjjohw/comment/on2kjck/

Real coverage gap: Cloudflare / Vercel / AWS WAF can block AI bot UAs at the edge before robots.txt is read. citable today fetches robots.txt cleanly and the AI bot allowlist check passes, but in practice AI crawlers may get a 403 from the firewall and never reach the page.

Proposed v0.3.0 check (C-26 Firewall AI Bot Block):
- For each of the 6 AI bots (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, Applebot-Extended), make a HEAD request to the homepage with that UA
- If response is 403, 401, or has Cloudflare / AWS-WAF challenge headers (cf-mitigated, server: cloudflare with challenge response), flag as FAIL
- Severity: P0 (this is an actual technical blocker, unlike C-02)
- Include the blocking response status + headers in evidence_snippet so users can debug

Pair this with the soft-404 detection check (separate issue) since both are about catching responses that LOOK fine but aren't.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add firewall blocking detection (fetch with AI bot UAs, flag 403/blocked) #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add firewall blocking detection (fetch with AI bot UAs, flag 403/blocked) #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions