An AI-powered Incident Response Plan engine for MSPs. Takes a free-text incident description, matches it against NIST 800-61 aligned playbooks via semantic search, and generates a role-assigned, time-bound action plan with regulatory notification requirements. BYOM adapter architecture across four LLM providers, ConnectWise + N8N integration, 2,000-incident scenario corpus + 60 tests.
"Gone-phishing" — because the response plan should be ready before someone goes phishing for what to do next.
When a security incident hits an MSP's client — phishing email clicked, ransomware detected, credentials compromised — the response quality depends entirely on whoever picks up the ticket. NIST 800-61 exists but it's a 79-page PDF nobody opens during an active incident. Regulatory windows (HIPAA 60 days, PCI-DSS 72 hours, state breach laws) are tight and easy to miss. ConnectWise tickets capture what happened, not what to do next, in what order, by whom.
The gap: there's no tool bridging "incident reported" and "here's the sequenced action plan with role assignments and regulatory deadlines."
This is that tool, and it's open-source so any MSP can run it.
incident description → ChromaDB semantic search → LLM action plan
(NIST-aligned playbooks) (role-assigned + time-bound + regulatory)
- Drop NIST 800-61-aligned playbooks into
playbooks/as Markdown — they auto-ingest into ChromaDB on startup - POST a free-text incident description to
/api/incident - Semantic search picks the most relevant playbook(s)
- The LLM (your choice — Anthropic, OpenAI, Gemini, or local Ollama) generates a prioritized action checklist with role assignments, timelines, and regulatory flags
- Follow-up chat at
/api/chatkeeps the incident context for investigation questions - Optionally: trigger ConnectWise ticket actions via the bundled cw-mcp tools, or fire N8N webhooks for escalation
| Layer | Technology |
|---|---|
| API | FastAPI (Python 3.11+) |
| Vector store | ChromaDB (cosine similarity, sentence-transformers) |
| LLM | BYOM — Anthropic Claude / OpenAI GPT-4o / Google Gemini / local Ollama (Llama 3.1, Phi-3, etc.) |
| Chat UI | Built-in (single-file dark theme), or Chainlit, or Open WebUI, or Gradio |
| Integrations | ConnectWise Manage (via cw-mcp), N8N webhooks |
| Tests | 60 pytest tests across API contracts, integration lifecycle, search ranking, adapter registry, corpus integrity |
Most "AI security automation" tools hardcode a single provider and a fixed playbook taxonomy. This one inverts both:
- BYOM (Bring Your Own Model) — the LLM is a swap. Production with Claude, dev with Ollama, regulatory environments with whatever's approved. Adapter pattern enforces a single interface; provider rejection at startup, not at request time
- Playbooks are Markdown, not configuration — drop a
.mdfile inplaybooks/, hit/api/ingest, it's searchable. Add your client's specific compliance playbook the same way you'd add documentation - Semantic search, not classification — incident descriptions don't need to use the right vocabulary. "User clicked a suspicious link and now their Outlook is sending emails to their contacts" finds the credential-compromise + phishing playbooks without needing the technician to classify the incident first
- 2,000-incident scenario corpus included — procedurally generated, MITRE ATT&CK-seeded, deterministic from a
--seed. Use for tabletop exercises, eval datasets, gap analysis, or demos. The data has a generator (scripts/generate_scenarios.py), not a mystery blob - Tests verify behavior, not syntax — "does the ransomware query rank the ransomware playbook first?" is a test. "Does this function return a dict?" is not.
- Built with security review baked into the dev process — XSS-via-prompt-injection (caught), error-message leakage (caught), rate limiting (added), security headers (set). The history shows the audits.
git clone https://github.com/jryan5150/gone-phishing.git
cd gone-phishing
# Install
pip install -r requirements.txt
# Configure
cp .env.example .env
# Edit .env — at minimum set your LLM provider's API key
# Run
cd server
python app.py
# → http://localhost:8100The server auto-ingests playbooks on startup. Open http://localhost:8100 for the built-in chat UI, or POST /api/incident for programmatic use.
Set LLM_PROVIDER in .env:
LLM_PROVIDER=anthropic # Claude (default; ANTHROPIC_API_KEY)
LLM_PROVIDER=openai # GPT-4o (OPENAI_API_KEY)
LLM_PROVIDER=gemini # Gemini 1.5 Pro (GEMINI_API_KEY)
LLM_PROVIDER=ollama # Local models (OLLAMA_HOST + OLLAMA_MODEL)For Ollama, pull your model first: ollama pull llama3.1:8b. Adapter rejection happens at startup — wrong provider name fails fast, not at first incident.
| Option | Set CHAT_UI= |
Install | What you get |
|---|---|---|---|
| Built-in | builtin |
Nothing extra | Single-file dark theme UI at / |
| Chainlit | chainlit |
pip install chainlit |
Production chat UI at /chat (mounts into FastAPI) |
| Open WebUI | — | Docker container | Full AI platform (connects via API) |
| Gradio | — | pip install gradio |
Quick demo interface |
See docs/WIRING.md for step-by-step setup of each.
| Endpoint | Method | Description |
|---|---|---|
/api/incident |
POST | Submit incident → get action plan |
/api/chat |
POST | Follow-up questions in chat context |
/api/search |
POST | Direct playbook semantic search |
/api/playbooks |
GET | List all ingested playbooks |
/api/ingest |
POST | Re-ingest playbook files |
/api/health |
GET | Server health (with dependency check) |
gone-phishing/
├── server/
│ ├── app.py # FastAPI server + chat UI mounting
│ ├── config.py # Centralised config with startup validation
│ ├── vector_store.py # ChromaDB ingestion + semantic search
│ ├── llm.py # Action plan generation (provider-agnostic)
│ ├── cl_app.py # Chainlit integration (optional)
│ │
│ ├── adapters/ # BYOM — Bring Your Own Model
│ │ ├── base.py # Abstract adapter interface
│ │ ├── anthropic_adapter.py
│ │ ├── openai_adapter.py
│ │ ├── gemini_adapter.py
│ │ └── ollama_adapter.py
│ │
│ └── tools/ # MCP tool modules
│ ├── irp_tools.py # Core IRP (search, plan, list)
│ ├── cw_tools.py # ConnectWise Manage (via cw-mcp)
│ └── n8n_tools.py # N8N webhook triggers
│
├── tests/ # 60 tests — API, integration, data, adapters
├── scripts/
│ └── generate_scenarios.py # Procedural scenario generator (pure Python)
├── playbooks/ # Drop .md files here → auto-ingested
│ ├── ransomware.md phishing.md data-breach.md bec.md ...
├── web/index.html # Built-in chat UI
├── data/
│ ├── scenarios.json # 2,000 generated incident scenarios
│ └── chroma/ # ChromaDB persistence (.gitignored)
├── docs/WIRING.md # Setup: chat UIs, CW MCP, N8N, LLM providers
├── pyproject.toml
├── .env.example
└── requirements.txt
pip install pytest httpx
pytest tests/ -v60 tests across 5 modules:
| Module | Tests | What it covers |
|---|---|---|
test_api.py |
23 | Endpoint contracts, validation, search ranking, idempotent ingestion |
test_integration.py |
8 | Full request lifecycle, context passing, re-ingest safety, error propagation |
test_vector_store.py |
9 | Chunking logic, overlap correctness, search quality, skip rules |
test_adapters.py |
5 | BYOM registry, unknown provider rejection, ABC enforcement |
test_scenarios.py |
15 | Corpus integrity, distribution, MITRE coverage, generator reproducibility |
Key behaviors the tests verify:
- Search ranking: ransomware query → ransomware playbook ranks first (not phishing)
- Multi-turn chat: search context uses latest user message, not first
- Idempotent ingest: re-ingestion produces identical chunk counts
- Error propagation: broken LLM → clean 500 with message, not stack trace
- Data integrity: all search results have metadata, no empty content
- Generator determinism: same
--seed→ identical scenarios output
data/scenarios.json contains 2,000 procedurally generated incident scenarios across 10 categories, seeded from MITRE ATT&CK techniques, real-world breach patterns, and MSP-specific environments.
Regenerate with:
python scripts/generate_scenarios.py --seed 42 --output data/scenarios.jsonUse for tabletop exercises, LLM fine-tuning / eval datasets, playbook gap analysis, or demos.
Drop any .md file into playbooks/ and POST /api/ingest. The system chunks it, embeds it, and makes it searchable. Suggested structure:
# [Incident Type]
## Severity Indicators
- ...
## Containment Steps
1. ...
## Investigation Steps
1. ...
## Notification Requirements
- HIPAA: ...
- State breach laws: ...
- PCI-DSS: ...
## Recovery Steps
1. ...The semantic matcher doesn't require this exact shape — but consistent structure improves chunking quality and search relevance.
The IRP engine delegates to your existing cw-mcp server for ticket operations and fires webhooks to N8N for escalation chains. Both are optional — the engine runs standalone.
If you have an MSP using ConnectWise: the bundled cw_tools.py reads incident-related tickets, creates response-plan tickets, and updates ticket status as the action plan progresses. See docs/WIRING.md for full configuration.
If you use N8N for orchestration: webhook triggers in n8n_tools.py fire on plan generation, escalation thresholds, and regulatory deadline approaches. Wire them to whatever escalation chain your team runs.
This project was built in public via AI-orchestrated development — brainstormed and prototyped on mobile, then moved to desktop for parallel agent execution. Each phase ships a coherent slice; commits below link to the moment.
| Phase | What | Commit |
|---|---|---|
| Foundation | IRP engine, BYOM adapters, CW MCP client, playbook ingestion, chat UI | 283e870 |
| Code Quality Audit | Fixed config duplication, removed dead code, singleton ChromaDB client, consistent pathlib | 39d9ec9 |
| Operational Hardening | Startup config validation (fail fast on missing keys), real health check with dependency verification, CORS spec compliance | 5efcb04 |
| Adapter Instrumentation | Logging on all LLM calls (model, latency, tokens), fixed Gemini SDK bug (system_instruction was in wrong location) | 3581961 |
| Frontend Security | Replaced regex markdown with marked.js, added DOMPurify to prevent XSS via LLM prompt injection | 7486c92 |
| Documentation Cleanup | Proper attribution for IRP template source, stripped leaked section numbers, removed docs for non-existent features | 5b45e3a |
| Data Provenance | Procedural scenario generator (pure Python, seeded, 10 categories, MITRE ATT&CK, 20 ransomware variants) — data now has a generator, not a mystery blob | 520b2b5 |
| Test Suite | 60 tests: API contracts, integration lifecycle, search ranking quality, adapter registry, corpus integrity, generator determinism | fbc3834 |
| Project Structure | pyproject.toml, proper package layout, README with test documentation, free-tier LLM guidance | 35797cc |
| Security Review | Rate limiting (slowapi), security headers, generic error messages (no internal detail leakage), XSS prevention | f06ad9b |
- Methodology: the Four-Layer Context Architecture used to build this — Identity / Rules / Memory / Project layers separated by change rate
- Production-runtime cousin: the sentinel layer in esexpress-v2 — same neuroscience grounding (adaptive trust scoring with feedback), different incarnation (security defense vs incident playbook engine)
Issues and PRs welcome. Particularly interested in:
- Additional NIST 800-61 aligned playbooks (different verticals — healthcare, finance, legal, manufacturing)
- New BYOM adapters (Anthropic Bedrock, Mistral, custom OpenAI-compatible endpoints)
- Compliance pack contributions (state-specific breach notification requirements, sector-specific frameworks)
- Test corpus extensions (more MITRE technique coverage, sector-specific scenarios)
Before contributing: read docs/WIRING.md for the full integration surface. Open an issue first if you're proposing a substantial change — keeps the discussion in public.
MIT — see LICENSE. Fork freely. No attribution required, though appreciated.
Built by Jace Ryan for the MSP that's tired of opening NIST 800-61 PDFs at 2 AM.