Block AI crawlers in 5 lines. No Cloudflare required.
"I am spending from 20–100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale … experiencing dozens of brief outages per week." — Drew DeVault, SourceHut founder
"Literally a DDoS on the entire internet." Log analysis showed ~25% of Diaspora's traffic was OpenAI user-agents, 15% Amazon, 4.3% Anthropic. — Dennis Schubert, Diaspora maintainer
Read the Docs: blocking AI crawlers dropped traffic from 800 GB/day → 200 GB/day (75% reduction), saving ~$1,500/month in bandwidth.
ai-bot-shield is a drop-in middleware library for self-hosted FOSS sites and indie operators. It loads a community-maintained AI bot signature database, optionally verifies RFC 9421 Web Bot Auth signatures for legitimate signed agents, and decides per-request whether to allow, log, or block. Three languages, six framework adapters, one shared registry.
| Node.js | Python | Go |
|---|---|---|
npm i @ai-bot-shield/express |
pip install ai-bot-shield |
go get github.com/mthamil107/ai-bot-shield/go |
import express from "express";
import { shield } from "@ai-bot-shield/express";
import registry from "ai-bot-shield-signatures/registry.json";
const app = express();
app.use(shield({ registry })); // ← that's it.
app.get("/", (req, res) => res.send("Hello world"));
app.listen(3000);Make a request with User-Agent: GPTBot/1.2 and you get back 403 Forbidden with an X-Bot-Verdict: openai-gptbot header. Make the same request with a normal browser user-agent and the route handler runs as usual.
- Community signature database — forked-and-enriched from
ai.robots.txt. Refreshed weekly. Adds categories, IP ranges, and RFC 9421 key directories the upstream doesn't track. - RFC 9421 verification (week 2) — for legitimate signed agents (OpenAI Operator, Google Project Mariner) per the IETF Web Bot Auth draft.
- Drop-in middleware for Express, Next.js, FastAPI, Flask, Go
net/http, and (planned) chi, echo, gin, Django. - No Cloudflare required — self-hosted operators on Vercel, Railway, Fly.io, or a plain VPS get the same protection.
- No telemetry — runs entirely in your process. We never see your traffic.
Honesty up front:
- Not a Cloudflare replacement. Cloudflare ships AI Labyrinth, Pay-Per-Crawl, Bot Management, and Web Bot Auth as a coordinated stack. We do one piece of that — server-side detection — at the OSS / self-hosted layer.
- Not a proof-of-work challenge. That's Anubis's job. Run Anubis upstream of ai-bot-shield if you want a PoW layer.
- Not a magic bullet. Training crawlers don't actually sign requests today (as of 2026), so the RFC 9421 half of the value-prop applies only to the agentic minority. We're honest about this in docs/rfc9421.md.
- Not a CAPTCHA. This is server-side decisioning, not a user-facing challenge.
| Language | Frameworks (v0) | Planned |
|---|---|---|
| Node / TypeScript | Express | Next.js, Fastify, Hono |
| Python | FastAPI / Starlette, Flask | Django, ASGI generic |
| Go | net/http |
chi, echo, gin |
ai-bot-shield runs anywhere your web framework runs:
- ✅ Vercel (Next.js middleware)
- ✅ Railway / Fly.io / Render
- ✅ Self-hosted VPS / bare metal
- ✅ Docker / Kubernetes
- ✅ Behind Cloudflare too (defense in depth)
The canonical database is signatures/registry.json — a JSON Schema-validated file with ~12 entries today, growing weekly. Schema reference: docs/SCHEMA.md.
Each entry has a stable ID, an operator, a category (training-crawler / agentic-fetcher / etc.), one or more user-agent regexes, a recommended action, and at least one piece of public evidence.
We do not fork-and-diverge from ai.robots.txt — that's the authoritative source for the user-agent set. We enrich and submit PRs upstream for new findings.
The fastest contribution path is to flag a new AI bot we don't yet know about. Open a new-signature issue — the form takes about three minutes.
See CONTRIBUTING.md for code contributions.
v0.0.1 (this release, 2026-05-21):
- ✅ Core matcher + registry loader in all three languages
- ✅ Express + FastAPI + Flask + Go
net/httpadapters - ✅ 12 seed signatures (OpenAI, Anthropic, Google, Meta, Perplexity, ByteDance, Amazon, Apple)
- ⏳ Sync script for
ai.robots.txtupstream - ⏳ RFC 9421 verifier (week 2)
Roadmap (next 4 weeks):
| Week | Goal |
|---|---|
| 1 | Core libraries + framework adapters (this release) |
| 2 | RFC 9421 Web Bot Auth verification module |
| 3 | Technical blog post launch ("How I cut my server's AI-crawler bandwidth by 75%") |
| 4 | Mozilla Builders application + outreach to design partners |
ai-bot-shield stands on the work of others:
ai-robots-txt/ai.robots.txt— community-maintained user-agent list.- Anubis (Xe Iaso / Techaro) — proof-of-work bot challenge, the FOSS distribution model we learned from.
cloudflare/web-bot-auth— reference RFC 9421 implementation.- Thibault Meunier (Cloudflare) and Martin Major (Google) — IETF Web Bot Auth draft authors.
Apache-2.0. See LICENSE.