Skip to content

mthamil107/ai-bot-shield

ai-bot-shield

Block AI crawlers in 5 lines. No Cloudflare required.

License alpha 12+ signatures Node 22 Python 3.10+ Go 1.23


"I am spending from 20–100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale … experiencing dozens of brief outages per week." — Drew DeVault, SourceHut founder

"Literally a DDoS on the entire internet." Log analysis showed ~25% of Diaspora's traffic was OpenAI user-agents, 15% Amazon, 4.3% Anthropic. — Dennis Schubert, Diaspora maintainer

Read the Docs: blocking AI crawlers dropped traffic from 800 GB/day → 200 GB/day (75% reduction), saving ~$1,500/month in bandwidth.

ai-bot-shield is a drop-in middleware library for self-hosted FOSS sites and indie operators. It loads a community-maintained AI bot signature database, optionally verifies RFC 9421 Web Bot Auth signatures for legitimate signed agents, and decides per-request whether to allow, log, or block. Three languages, six framework adapters, one shared registry.

Install

Node.jsPythonGo
npm i @ai-bot-shield/express
pip install ai-bot-shield
go get github.com/mthamil107/ai-bot-shield/go

30-second example (Express)

import express from "express";
import { shield } from "@ai-bot-shield/express";
import registry from "ai-bot-shield-signatures/registry.json";

const app = express();
app.use(shield({ registry }));   // ← that's it.

app.get("/", (req, res) => res.send("Hello world"));
app.listen(3000);

Make a request with User-Agent: GPTBot/1.2 and you get back 403 Forbidden with an X-Bot-Verdict: openai-gptbot header. Make the same request with a normal browser user-agent and the route handler runs as usual.

What it does

  • Community signature database — forked-and-enriched from ai.robots.txt. Refreshed weekly. Adds categories, IP ranges, and RFC 9421 key directories the upstream doesn't track.
  • RFC 9421 verification (week 2) — for legitimate signed agents (OpenAI Operator, Google Project Mariner) per the IETF Web Bot Auth draft.
  • Drop-in middleware for Express, Next.js, FastAPI, Flask, Go net/http, and (planned) chi, echo, gin, Django.
  • No Cloudflare required — self-hosted operators on Vercel, Railway, Fly.io, or a plain VPS get the same protection.
  • No telemetry — runs entirely in your process. We never see your traffic.

What it isn't

Honesty up front:

  • Not a Cloudflare replacement. Cloudflare ships AI Labyrinth, Pay-Per-Crawl, Bot Management, and Web Bot Auth as a coordinated stack. We do one piece of that — server-side detection — at the OSS / self-hosted layer.
  • Not a proof-of-work challenge. That's Anubis's job. Run Anubis upstream of ai-bot-shield if you want a PoW layer.
  • Not a magic bullet. Training crawlers don't actually sign requests today (as of 2026), so the RFC 9421 half of the value-prop applies only to the agentic minority. We're honest about this in docs/rfc9421.md.
  • Not a CAPTCHA. This is server-side decisioning, not a user-facing challenge.

Frameworks supported

Language Frameworks (v0) Planned
Node / TypeScript Express Next.js, Fastify, Hono
Python FastAPI / Starlette, Flask Django, ASGI generic
Go net/http chi, echo, gin

Deploy targets

ai-bot-shield runs anywhere your web framework runs:

  • ✅ Vercel (Next.js middleware)
  • ✅ Railway / Fly.io / Render
  • ✅ Self-hosted VPS / bare metal
  • ✅ Docker / Kubernetes
  • ✅ Behind Cloudflare too (defense in depth)

How signatures work

The canonical database is signatures/registry.json — a JSON Schema-validated file with ~12 entries today, growing weekly. Schema reference: docs/SCHEMA.md.

Each entry has a stable ID, an operator, a category (training-crawler / agentic-fetcher / etc.), one or more user-agent regexes, a recommended action, and at least one piece of public evidence.

We do not fork-and-diverge from ai.robots.txt — that's the authoritative source for the user-agent set. We enrich and submit PRs upstream for new findings.

Contributing a signature

The fastest contribution path is to flag a new AI bot we don't yet know about. Open a new-signature issue — the form takes about three minutes.

See CONTRIBUTING.md for code contributions.

Status & roadmap

v0.0.1 (this release, 2026-05-21):

  • ✅ Core matcher + registry loader in all three languages
  • ✅ Express + FastAPI + Flask + Go net/http adapters
  • ✅ 12 seed signatures (OpenAI, Anthropic, Google, Meta, Perplexity, ByteDance, Amazon, Apple)
  • ⏳ Sync script for ai.robots.txt upstream
  • ⏳ RFC 9421 verifier (week 2)

Roadmap (next 4 weeks):

Week Goal
1 Core libraries + framework adapters (this release)
2 RFC 9421 Web Bot Auth verification module
3 Technical blog post launch ("How I cut my server's AI-crawler bandwidth by 75%")
4 Mozilla Builders application + outreach to design partners

Credits

ai-bot-shield stands on the work of others:

  • ai-robots-txt/ai.robots.txt — community-maintained user-agent list.
  • Anubis (Xe Iaso / Techaro) — proof-of-work bot challenge, the FOSS distribution model we learned from.
  • cloudflare/web-bot-auth — reference RFC 9421 implementation.
  • Thibault Meunier (Cloudflare) and Martin Major (Google) — IETF Web Bot Auth draft authors.

License

Apache-2.0. See LICENSE.

About

Drop-in middleware that detects AI bot traffic. Community-maintained signature database + RFC 9421 Web Bot Auth verification. Node + Python + Go. Apache-2.0.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors