Skip to content

Add firewall blocking detection (fetch with AI bot UAs, flag 403/blocked) #2

@WorkSmartAI-alt

Description

@WorkSmartAI-alt

Per r/coolgithubprojects feedback from @scarletdawnredd: https://www.reddit.com/r/coolgithubprojects/comments/1tjjohw/comment/on2kjck/

Real coverage gap: Cloudflare / Vercel / AWS WAF can block AI bot UAs at the edge before robots.txt is read. citable today fetches robots.txt cleanly and the AI bot allowlist check passes, but in practice AI crawlers may get a 403 from the firewall and never reach the page.

Proposed v0.3.0 check (C-26 Firewall AI Bot Block):

  • For each of the 6 AI bots (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, Applebot-Extended), make a HEAD request to the homepage with that UA
  • If response is 403, 401, or has Cloudflare / AWS-WAF challenge headers (cf-mitigated, server: cloudflare with challenge response), flag as FAIL
  • Severity: P0 (this is an actual technical blocker, unlike C-02)
  • Include the blocking response status + headers in evidence_snippet so users can debug

Pair this with the soft-404 detection check (separate issue) since both are about catching responses that LOOK fine but aren't.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions