Gone-Phishing

An AI-powered Incident Response Plan engine for MSPs. Takes a free-text incident description, matches it against NIST 800-61 aligned playbooks via semantic search, and generates a role-assigned, time-bound action plan with regulatory notification requirements. BYOM adapter architecture across four LLM providers, ConnectWise + N8N integration, 2,000-incident scenario corpus + 60 tests.

"Gone-phishing" — because the response plan should be ready before someone goes phishing for what to do next.

Why this exists

When a security incident hits an MSP's client — phishing email clicked, ransomware detected, credentials compromised — the response quality depends entirely on whoever picks up the ticket. NIST 800-61 exists but it's a 79-page PDF nobody opens during an active incident. Regulatory windows (HIPAA 60 days, PCI-DSS 72 hours, state breach laws) are tight and easy to miss. ConnectWise tickets capture what happened, not what to do next, in what order, by whom.

The gap: there's no tool bridging "incident reported" and "here's the sequenced action plan with role assignments and regulatory deadlines."

This is that tool, and it's open-source so any MSP can run it.

How it works

incident description  →  ChromaDB semantic search  →  LLM action plan
                          (NIST-aligned playbooks)     (role-assigned + time-bound + regulatory)

Drop NIST 800-61-aligned playbooks into playbooks/ as Markdown — they auto-ingest into ChromaDB on startup
POST a free-text incident description to /api/incident
Semantic search picks the most relevant playbook(s)
The LLM (your choice — Anthropic, OpenAI, Gemini, or local Ollama) generates a prioritized action checklist with role assignments, timelines, and regulatory flags
Follow-up chat at /api/chat keeps the incident context for investigation questions
Optionally: trigger ConnectWise ticket actions via the bundled cw-mcp tools, or fire N8N webhooks for escalation

Stack

Layer	Technology
API	FastAPI (Python 3.11+)
Vector store	ChromaDB (cosine similarity, sentence-transformers)
LLM	BYOM — Anthropic Claude / OpenAI GPT-4o / Google Gemini / local Ollama (Llama 3.1, Phi-3, etc.)
Chat UI	Built-in (single-file dark theme), or Chainlit, or Open WebUI, or Gradio
Integrations	ConnectWise Manage (via cw-mcp), N8N webhooks
Tests	60 pytest tests across API contracts, integration lifecycle, search ranking, adapter registry, corpus integrity

What's distinctive

Most "AI security automation" tools hardcode a single provider and a fixed playbook taxonomy. This one inverts both:

BYOM (Bring Your Own Model) — the LLM is a swap. Production with Claude, dev with Ollama, regulatory environments with whatever's approved. Adapter pattern enforces a single interface; provider rejection at startup, not at request time
Playbooks are Markdown, not configuration — drop a .md file in playbooks/, hit /api/ingest, it's searchable. Add your client's specific compliance playbook the same way you'd add documentation
Semantic search, not classification — incident descriptions don't need to use the right vocabulary. "User clicked a suspicious link and now their Outlook is sending emails to their contacts" finds the credential-compromise + phishing playbooks without needing the technician to classify the incident first
2,000-incident scenario corpus included — procedurally generated, MITRE ATT&CK-seeded, deterministic from a --seed. Use for tabletop exercises, eval datasets, gap analysis, or demos. The data has a generator (scripts/generate_scenarios.py), not a mystery blob
Tests verify behavior, not syntax — "does the ransomware query rank the ransomware playbook first?" is a test. "Does this function return a dict?" is not.
Built with security review baked into the dev process — XSS-via-prompt-injection (caught), error-message leakage (caught), rate limiting (added), security headers (set). The history shows the audits.

Quick start

git clone https://github.com/jryan5150/gone-phishing.git
cd gone-phishing

# Install
pip install -r requirements.txt

# Configure
cp .env.example .env
# Edit .env — at minimum set your LLM provider's API key

# Run
cd server
python app.py
# → http://localhost:8100

The server auto-ingests playbooks on startup. Open http://localhost:8100 for the built-in chat UI, or POST /api/incident for programmatic use.

BYOM — Bring Your Own Model

Set LLM_PROVIDER in .env:

LLM_PROVIDER=anthropic      # Claude (default; ANTHROPIC_API_KEY)
LLM_PROVIDER=openai         # GPT-4o (OPENAI_API_KEY)
LLM_PROVIDER=gemini         # Gemini 1.5 Pro (GEMINI_API_KEY)
LLM_PROVIDER=ollama         # Local models (OLLAMA_HOST + OLLAMA_MODEL)

For Ollama, pull your model first: ollama pull llama3.1:8b. Adapter rejection happens at startup — wrong provider name fails fast, not at first incident.

Chat UI options

Option	Set `CHAT_UI=`	Install	What you get
Built-in	`builtin`	Nothing extra	Single-file dark theme UI at `/`
Chainlit	`chainlit`	`pip install chainlit`	Production chat UI at `/chat` (mounts into FastAPI)
Open WebUI	—	Docker container	Full AI platform (connects via API)
Gradio	—	`pip install gradio`	Quick demo interface

See docs/WIRING.md for step-by-step setup of each.

API

Endpoint	Method	Description
`/api/incident`	POST	Submit incident → get action plan
`/api/chat`	POST	Follow-up questions in chat context
`/api/search`	POST	Direct playbook semantic search
`/api/playbooks`	GET	List all ingested playbooks
`/api/ingest`	POST	Re-ingest playbook files
`/api/health`	GET	Server health (with dependency check)

Project structure

gone-phishing/
├── server/
│   ├── app.py                 # FastAPI server + chat UI mounting
│   ├── config.py              # Centralised config with startup validation
│   ├── vector_store.py        # ChromaDB ingestion + semantic search
│   ├── llm.py                 # Action plan generation (provider-agnostic)
│   ├── cl_app.py              # Chainlit integration (optional)
│   │
│   ├── adapters/              # BYOM — Bring Your Own Model
│   │   ├── base.py            # Abstract adapter interface
│   │   ├── anthropic_adapter.py
│   │   ├── openai_adapter.py
│   │   ├── gemini_adapter.py
│   │   └── ollama_adapter.py
│   │
│   └── tools/                 # MCP tool modules
│       ├── irp_tools.py       # Core IRP (search, plan, list)
│       ├── cw_tools.py        # ConnectWise Manage (via cw-mcp)
│       └── n8n_tools.py       # N8N webhook triggers
│
├── tests/                     # 60 tests — API, integration, data, adapters
├── scripts/
│   └── generate_scenarios.py  # Procedural scenario generator (pure Python)
├── playbooks/                 # Drop .md files here → auto-ingested
│   ├── ransomware.md  phishing.md  data-breach.md  bec.md  ...
├── web/index.html             # Built-in chat UI
├── data/
│   ├── scenarios.json         # 2,000 generated incident scenarios
│   └── chroma/                # ChromaDB persistence (.gitignored)
├── docs/WIRING.md             # Setup: chat UIs, CW MCP, N8N, LLM providers
├── pyproject.toml
├── .env.example
└── requirements.txt

Tests

pip install pytest httpx
pytest tests/ -v

60 tests across 5 modules:

Module	Tests	What it covers
`test_api.py`	23	Endpoint contracts, validation, search ranking, idempotent ingestion
`test_integration.py`	8	Full request lifecycle, context passing, re-ingest safety, error propagation
`test_vector_store.py`	9	Chunking logic, overlap correctness, search quality, skip rules
`test_adapters.py`	5	BYOM registry, unknown provider rejection, ABC enforcement
`test_scenarios.py`	15	Corpus integrity, distribution, MITRE coverage, generator reproducibility

Key behaviors the tests verify:

Search ranking: ransomware query → ransomware playbook ranks first (not phishing)
Multi-turn chat: search context uses latest user message, not first
Idempotent ingest: re-ingestion produces identical chunk counts
Error propagation: broken LLM → clean 500 with message, not stack trace
Data integrity: all search results have metadata, no empty content
Generator determinism: same --seed → identical scenarios output

Scenario corpus

data/scenarios.json contains 2,000 procedurally generated incident scenarios across 10 categories, seeded from MITRE ATT&CK techniques, real-world breach patterns, and MSP-specific environments.

Regenerate with:

python scripts/generate_scenarios.py --seed 42 --output data/scenarios.json

Use for tabletop exercises, LLM fine-tuning / eval datasets, playbook gap analysis, or demos.

Adding playbooks

Drop any .md file into playbooks/ and POST /api/ingest. The system chunks it, embeds it, and makes it searchable. Suggested structure:

# [Incident Type]

## Severity Indicators

- ...

## Containment Steps

1. ...

## Investigation Steps

1. ...

## Notification Requirements

- HIPAA: ...
- State breach laws: ...
- PCI-DSS: ...

## Recovery Steps

1. ...

The semantic matcher doesn't require this exact shape — but consistent structure improves chunking quality and search relevance.

ConnectWise + N8N integration

The IRP engine delegates to your existing cw-mcp server for ticket operations and fires webhooks to N8N for escalation chains. Both are optional — the engine runs standalone.

If you have an MSP using ConnectWise: the bundled cw_tools.py reads incident-related tickets, creates response-plan tickets, and updates ticket status as the action plan progresses. See docs/WIRING.md for full configuration.

If you use N8N for orchestration: webhook triggers in n8n_tools.py fire on plan generation, escalation thresholds, and regulatory deadline approaches. Wire them to whatever escalation chain your team runs.

Development history

This project was built in public via AI-orchestrated development — brainstormed and prototyped on mobile, then moved to desktop for parallel agent execution. Each phase ships a coherent slice; commits below link to the moment.

Phase	What	Commit
Foundation	IRP engine, BYOM adapters, CW MCP client, playbook ingestion, chat UI	`283e870`
Code Quality Audit	Fixed config duplication, removed dead code, singleton ChromaDB client, consistent pathlib	`39d9ec9`
Operational Hardening	Startup config validation (fail fast on missing keys), real health check with dependency verification, CORS spec compliance	`5efcb04`
Adapter Instrumentation	Logging on all LLM calls (model, latency, tokens), fixed Gemini SDK bug (system_instruction was in wrong location)	`3581961`
Frontend Security	Replaced regex markdown with marked.js, added DOMPurify to prevent XSS via LLM prompt injection	`7486c92`
Documentation Cleanup	Proper attribution for IRP template source, stripped leaked section numbers, removed docs for non-existent features	`5b45e3a`
Data Provenance	Procedural scenario generator (pure Python, seeded, 10 categories, MITRE ATT&CK, 20 ransomware variants) — data now has a generator, not a mystery blob	`520b2b5`
Test Suite	60 tests: API contracts, integration lifecycle, search ranking quality, adapter registry, corpus integrity, generator determinism	`fbc3834`
Project Structure	pyproject.toml, proper package layout, README with test documentation, free-tier LLM guidance	`35797cc`
Security Review	Rate limiting (slowapi), security headers, generic error messages (no internal detail leakage), XSS prevention	`f06ad9b`

Related work

Methodology: the Four-Layer Context Architecture used to build this — Identity / Rules / Memory / Project layers separated by change rate
Production-runtime cousin: the sentinel layer in esexpress-v2 — same neuroscience grounding (adaptive trust scoring with feedback), different incarnation (security defense vs incident playbook engine)

Contributing

Issues and PRs welcome. Particularly interested in:

Additional NIST 800-61 aligned playbooks (different verticals — healthcare, finance, legal, manufacturing)
New BYOM adapters (Anthropic Bedrock, Mistral, custom OpenAI-compatible endpoints)
Compliance pack contributions (state-specific breach notification requirements, sector-specific frameworks)
Test corpus extensions (more MITRE technique coverage, sector-specific scenarios)

Before contributing: read docs/WIRING.md for the full integration surface. Open an issue first if you're proposing a substantial change — keeps the discussion in public.

License

MIT — see LICENSE. Fork freely. No attribution required, though appreciated.

Built by Jace Ryan for the MSP that's tired of opening NIST 800-61 PDFs at 2 AM.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
data		data
docs		docs
n8n		n8n
playbooks		playbooks
scripts		scripts
server		server
tests		tests
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
REVIEW-GUIDE.txt		REVIEW-GUIDE.txt
SECURITY.md		SECURITY.md
design-iterate-contracts.json		design-iterate-contracts.json
pyproject.toml		pyproject.toml
railway.json		railway.json
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gone-Phishing

Why this exists

How it works

Stack

What's distinctive

Quick start

BYOM — Bring Your Own Model

Chat UI options

API

Project structure

Tests

Scenario corpus

Adding playbooks

ConnectWise + N8N integration

Development history

Related work

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gone-Phishing

Why this exists

How it works

Stack

What's distinctive

Quick start

BYOM — Bring Your Own Model

Chat UI options

API

Project structure

Tests

Scenario corpus

Adding playbooks

ConnectWise + N8N integration

Development history

Related work

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages