Skip to content

jryan5150/gone-phishing

Gone-Phishing

License: MIT Python 3.11+ FastAPI Tests

An AI-powered Incident Response Plan engine for MSPs. Takes a free-text incident description, matches it against NIST 800-61 aligned playbooks via semantic search, and generates a role-assigned, time-bound action plan with regulatory notification requirements. BYOM adapter architecture across four LLM providers, ConnectWise + N8N integration, 2,000-incident scenario corpus + 60 tests.

"Gone-phishing" — because the response plan should be ready before someone goes phishing for what to do next.


Why this exists

When a security incident hits an MSP's client — phishing email clicked, ransomware detected, credentials compromised — the response quality depends entirely on whoever picks up the ticket. NIST 800-61 exists but it's a 79-page PDF nobody opens during an active incident. Regulatory windows (HIPAA 60 days, PCI-DSS 72 hours, state breach laws) are tight and easy to miss. ConnectWise tickets capture what happened, not what to do next, in what order, by whom.

The gap: there's no tool bridging "incident reported" and "here's the sequenced action plan with role assignments and regulatory deadlines."

This is that tool, and it's open-source so any MSP can run it.

How it works

incident description  →  ChromaDB semantic search  →  LLM action plan
                          (NIST-aligned playbooks)     (role-assigned + time-bound + regulatory)
  1. Drop NIST 800-61-aligned playbooks into playbooks/ as Markdown — they auto-ingest into ChromaDB on startup
  2. POST a free-text incident description to /api/incident
  3. Semantic search picks the most relevant playbook(s)
  4. The LLM (your choice — Anthropic, OpenAI, Gemini, or local Ollama) generates a prioritized action checklist with role assignments, timelines, and regulatory flags
  5. Follow-up chat at /api/chat keeps the incident context for investigation questions
  6. Optionally: trigger ConnectWise ticket actions via the bundled cw-mcp tools, or fire N8N webhooks for escalation

Stack

Layer Technology
API FastAPI (Python 3.11+)
Vector store ChromaDB (cosine similarity, sentence-transformers)
LLM BYOM — Anthropic Claude / OpenAI GPT-4o / Google Gemini / local Ollama (Llama 3.1, Phi-3, etc.)
Chat UI Built-in (single-file dark theme), or Chainlit, or Open WebUI, or Gradio
Integrations ConnectWise Manage (via cw-mcp), N8N webhooks
Tests 60 pytest tests across API contracts, integration lifecycle, search ranking, adapter registry, corpus integrity

What's distinctive

Most "AI security automation" tools hardcode a single provider and a fixed playbook taxonomy. This one inverts both:

  • BYOM (Bring Your Own Model) — the LLM is a swap. Production with Claude, dev with Ollama, regulatory environments with whatever's approved. Adapter pattern enforces a single interface; provider rejection at startup, not at request time
  • Playbooks are Markdown, not configuration — drop a .md file in playbooks/, hit /api/ingest, it's searchable. Add your client's specific compliance playbook the same way you'd add documentation
  • Semantic search, not classification — incident descriptions don't need to use the right vocabulary. "User clicked a suspicious link and now their Outlook is sending emails to their contacts" finds the credential-compromise + phishing playbooks without needing the technician to classify the incident first
  • 2,000-incident scenario corpus included — procedurally generated, MITRE ATT&CK-seeded, deterministic from a --seed. Use for tabletop exercises, eval datasets, gap analysis, or demos. The data has a generator (scripts/generate_scenarios.py), not a mystery blob
  • Tests verify behavior, not syntax"does the ransomware query rank the ransomware playbook first?" is a test. "Does this function return a dict?" is not.
  • Built with security review baked into the dev process — XSS-via-prompt-injection (caught), error-message leakage (caught), rate limiting (added), security headers (set). The history shows the audits.

Quick start

git clone https://github.com/jryan5150/gone-phishing.git
cd gone-phishing

# Install
pip install -r requirements.txt

# Configure
cp .env.example .env
# Edit .env — at minimum set your LLM provider's API key

# Run
cd server
python app.py
# → http://localhost:8100

The server auto-ingests playbooks on startup. Open http://localhost:8100 for the built-in chat UI, or POST /api/incident for programmatic use.

BYOM — Bring Your Own Model

Set LLM_PROVIDER in .env:

LLM_PROVIDER=anthropic      # Claude (default; ANTHROPIC_API_KEY)
LLM_PROVIDER=openai         # GPT-4o (OPENAI_API_KEY)
LLM_PROVIDER=gemini         # Gemini 1.5 Pro (GEMINI_API_KEY)
LLM_PROVIDER=ollama         # Local models (OLLAMA_HOST + OLLAMA_MODEL)

For Ollama, pull your model first: ollama pull llama3.1:8b. Adapter rejection happens at startup — wrong provider name fails fast, not at first incident.

Chat UI options

Option Set CHAT_UI= Install What you get
Built-in builtin Nothing extra Single-file dark theme UI at /
Chainlit chainlit pip install chainlit Production chat UI at /chat (mounts into FastAPI)
Open WebUI Docker container Full AI platform (connects via API)
Gradio pip install gradio Quick demo interface

See docs/WIRING.md for step-by-step setup of each.

API

Endpoint Method Description
/api/incident POST Submit incident → get action plan
/api/chat POST Follow-up questions in chat context
/api/search POST Direct playbook semantic search
/api/playbooks GET List all ingested playbooks
/api/ingest POST Re-ingest playbook files
/api/health GET Server health (with dependency check)

Project structure

gone-phishing/
├── server/
│   ├── app.py                 # FastAPI server + chat UI mounting
│   ├── config.py              # Centralised config with startup validation
│   ├── vector_store.py        # ChromaDB ingestion + semantic search
│   ├── llm.py                 # Action plan generation (provider-agnostic)
│   ├── cl_app.py              # Chainlit integration (optional)
│   │
│   ├── adapters/              # BYOM — Bring Your Own Model
│   │   ├── base.py            # Abstract adapter interface
│   │   ├── anthropic_adapter.py
│   │   ├── openai_adapter.py
│   │   ├── gemini_adapter.py
│   │   └── ollama_adapter.py
│   │
│   └── tools/                 # MCP tool modules
│       ├── irp_tools.py       # Core IRP (search, plan, list)
│       ├── cw_tools.py        # ConnectWise Manage (via cw-mcp)
│       └── n8n_tools.py       # N8N webhook triggers
│
├── tests/                     # 60 tests — API, integration, data, adapters
├── scripts/
│   └── generate_scenarios.py  # Procedural scenario generator (pure Python)
├── playbooks/                 # Drop .md files here → auto-ingested
│   ├── ransomware.md  phishing.md  data-breach.md  bec.md  ...
├── web/index.html             # Built-in chat UI
├── data/
│   ├── scenarios.json         # 2,000 generated incident scenarios
│   └── chroma/                # ChromaDB persistence (.gitignored)
├── docs/WIRING.md             # Setup: chat UIs, CW MCP, N8N, LLM providers
├── pyproject.toml
├── .env.example
└── requirements.txt

Tests

pip install pytest httpx
pytest tests/ -v

60 tests across 5 modules:

Module Tests What it covers
test_api.py 23 Endpoint contracts, validation, search ranking, idempotent ingestion
test_integration.py 8 Full request lifecycle, context passing, re-ingest safety, error propagation
test_vector_store.py 9 Chunking logic, overlap correctness, search quality, skip rules
test_adapters.py 5 BYOM registry, unknown provider rejection, ABC enforcement
test_scenarios.py 15 Corpus integrity, distribution, MITRE coverage, generator reproducibility

Key behaviors the tests verify:

  • Search ranking: ransomware query → ransomware playbook ranks first (not phishing)
  • Multi-turn chat: search context uses latest user message, not first
  • Idempotent ingest: re-ingestion produces identical chunk counts
  • Error propagation: broken LLM → clean 500 with message, not stack trace
  • Data integrity: all search results have metadata, no empty content
  • Generator determinism: same --seed → identical scenarios output

Scenario corpus

data/scenarios.json contains 2,000 procedurally generated incident scenarios across 10 categories, seeded from MITRE ATT&CK techniques, real-world breach patterns, and MSP-specific environments.

Regenerate with:

python scripts/generate_scenarios.py --seed 42 --output data/scenarios.json

Use for tabletop exercises, LLM fine-tuning / eval datasets, playbook gap analysis, or demos.

Adding playbooks

Drop any .md file into playbooks/ and POST /api/ingest. The system chunks it, embeds it, and makes it searchable. Suggested structure:

# [Incident Type]

## Severity Indicators

- ...

## Containment Steps

1. ...

## Investigation Steps

1. ...

## Notification Requirements

- HIPAA: ...
- State breach laws: ...
- PCI-DSS: ...

## Recovery Steps

1. ...

The semantic matcher doesn't require this exact shape — but consistent structure improves chunking quality and search relevance.

ConnectWise + N8N integration

The IRP engine delegates to your existing cw-mcp server for ticket operations and fires webhooks to N8N for escalation chains. Both are optional — the engine runs standalone.

If you have an MSP using ConnectWise: the bundled cw_tools.py reads incident-related tickets, creates response-plan tickets, and updates ticket status as the action plan progresses. See docs/WIRING.md for full configuration.

If you use N8N for orchestration: webhook triggers in n8n_tools.py fire on plan generation, escalation thresholds, and regulatory deadline approaches. Wire them to whatever escalation chain your team runs.

Development history

This project was built in public via AI-orchestrated development — brainstormed and prototyped on mobile, then moved to desktop for parallel agent execution. Each phase ships a coherent slice; commits below link to the moment.

Phase What Commit
Foundation IRP engine, BYOM adapters, CW MCP client, playbook ingestion, chat UI 283e870
Code Quality Audit Fixed config duplication, removed dead code, singleton ChromaDB client, consistent pathlib 39d9ec9
Operational Hardening Startup config validation (fail fast on missing keys), real health check with dependency verification, CORS spec compliance 5efcb04
Adapter Instrumentation Logging on all LLM calls (model, latency, tokens), fixed Gemini SDK bug (system_instruction was in wrong location) 3581961
Frontend Security Replaced regex markdown with marked.js, added DOMPurify to prevent XSS via LLM prompt injection 7486c92
Documentation Cleanup Proper attribution for IRP template source, stripped leaked section numbers, removed docs for non-existent features 5b45e3a
Data Provenance Procedural scenario generator (pure Python, seeded, 10 categories, MITRE ATT&CK, 20 ransomware variants) — data now has a generator, not a mystery blob 520b2b5
Test Suite 60 tests: API contracts, integration lifecycle, search ranking quality, adapter registry, corpus integrity, generator determinism fbc3834
Project Structure pyproject.toml, proper package layout, README with test documentation, free-tier LLM guidance 35797cc
Security Review Rate limiting (slowapi), security headers, generic error messages (no internal detail leakage), XSS prevention f06ad9b

Related work

  • Methodology: the Four-Layer Context Architecture used to build this — Identity / Rules / Memory / Project layers separated by change rate
  • Production-runtime cousin: the sentinel layer in esexpress-v2 — same neuroscience grounding (adaptive trust scoring with feedback), different incarnation (security defense vs incident playbook engine)

Contributing

Issues and PRs welcome. Particularly interested in:

  • Additional NIST 800-61 aligned playbooks (different verticals — healthcare, finance, legal, manufacturing)
  • New BYOM adapters (Anthropic Bedrock, Mistral, custom OpenAI-compatible endpoints)
  • Compliance pack contributions (state-specific breach notification requirements, sector-specific frameworks)
  • Test corpus extensions (more MITRE technique coverage, sector-specific scenarios)

Before contributing: read docs/WIRING.md for the full integration surface. Open an issue first if you're proposing a substantial change — keeps the discussion in public.

License

MIT — see LICENSE. Fork freely. No attribution required, though appreciated.


Built by Jace Ryan for the MSP that's tired of opening NIST 800-61 PDFs at 2 AM.

About

AI-powered Incident Response Plan engine for MSPs — NIST 800-61 aligned playbooks, BYOM (Anthropic/OpenAI/Gemini/Ollama), ConnectWise + N8N integration, 2,000-incident scenario corpus + 60 tests

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages