Percona Developer Knowledge (percona-dk)

Status: Fully functional. 22 doc repos across 7 stacks, plus the Percona Community blog and the Percona forums. MCP + REST API working. Supports Markdown and reStructuredText. With community interest, this could grow into an official Percona developer resource.

Semantic search and retrieval of Percona documentation, community blog posts, and forum threads for AI assistants and developer tools.

percona-dk ingests three kinds of Percona knowledge — official docs from GitHub repos, blog posts from percona.community, and threads from forums.percona.com — chunks and embeds them locally, and exposes them via REST API and MCP server. Your AI tools get accurate, up-to-date Percona knowledge, not stale training data, not fragile web scraping.

Where it helps most today: people asking AI tools Percona questions and acting on the answers -- configuring, debugging, planning. Where it matters most (and growing fast): when AI tools write install scripts, Ansible playbooks, Terraform configs, or step-by-step guides for Percona products. That output goes to real infrastructure. Without percona-dk, it comes from stale training data: wrong package names, deprecated flags, missing safety checks. With percona-dk, the AI pulls from current official docs. The human still reviews and runs it, but the starting point is accurate instead of plausible.

Why this matters

It's not just about new information. Percona DK helps in four distinct ways:

New features the LLM can't know about -- PXC 8.4 added a Clone plugin for SST in April 2025. No LLM has this in training data. Without DK, the AI confidently tells you the feature doesn't exist.
Percona-specific products the LLM overlooks -- Percona built a dedicated tool for Atlas-to-PSMDB migrations (Percona Link for MongoDB). Without DK, the AI recommends mongosync or a DIY approach. The right tool exists -- the LLM just doesn't know about it.
Operational details the LLM gets vaguely right but not precisely right -- This is the most common day-to-day value. The AI gives you a reasonable answer, but DK gives you the exact flags, version constraints, setup gotchas (like needing to enable MongoDB profiling for PMM Query Analytics), and copy-paste commands from current docs. When you're writing production configs or answering a customer, "mostly right" isn't good enough.
Real-world troubleshooting and field wisdom the docs don't capture -- Docs tell you what a feature does; the community tells you what actually happens when you use it. With blog posts and forum threads indexed alongside docs, the AI can surface real tuning numbers, known-good recovery procedures, reported bugs, integration patterns, and version-specific quirks that engineers have already worked through. Exact error messages from forum threads are especially valuable -- when a user pastes "[ERROR] WSREP: Failed to open backend connection", DK can surface the exact thread where someone else hit and solved it.

Supported tools

percona-dk works with any AI tool that supports MCP or HTTP APIs:

Tool	How it connects	Windows
Claude Desktop	MCP server (stdio) - add to `claude_desktop_config.json`	Yes
Claude Code	MCP server (stdio) - add to `.claude/settings.json`	Yes
Cursor	MCP server (stdio) - add to `.cursor/mcp.json`	Yes
Windsurf	MCP server (stdio) - add to Windsurf MCP settings	Yes
GitHub Copilot	MCP server (stdio) - add to `.vscode/mcp.json`, use Agent Mode	Yes
OpenAI Codex CLI	MCP server (stdio) - add to `~/.codex/config.toml`	WSL only
Codex IDE extension	MCP server (stdio) - shares config with Codex CLI	Yes (VS Code)
Cherry Studio	MCP server (stdio) - add to MCP settings	Yes
LM Studio	MCP server (stdio) - configure in MCP client settings	Yes
AnythingLLM	MCP server (stdio) - edit `anythingllm_mcp_servers.json`	Yes
Open WebUI	REST API - point to `http://localhost:8000`	Yes
LibreChat	REST API or MCP via proxy - configure in YAML	Yes
Any MCP client	MCP server (stdio)	-
Any HTTP client	REST API on port 8000	-

Windows note: percona-dk itself runs on Windows natively (Python + pip install). For the Codex CLI specifically, OpenAI recommends running inside WSL, though the Codex IDE extension in VS Code works natively. All other tools listed above work on Windows without WSL.

LLM compatibility: MCP is a protocol, not a model feature. Any LLM with tool/function-calling support works, including Claude, GPT-4o, Gemini, Qwen, Llama (via Ollama), Mistral, and others. Reasoning-only models without tool-calling support are not compatible.

Why

Every AI tool you use has Percona in its training data. The problem is that training data is stale, incomplete, and sometimes wrong:

Deprecated syntax -- LLMs still recommend innobackupex (removed in XtraBackup 8.0) and hallucinate flags that don't exist.
Missing Percona-specific features -- Percona Server has features upstream MySQL doesn't, and vice versa. Generic training data doesn't distinguish them. Percona built a dedicated tool for Atlas-to-PSMDB migrations (Percona Link for MongoDB), but without DK, the AI recommends mongosync or a DIY approach.
Wrong product version -- An answer based on MySQL 5.7 docs applied to Percona Server 8.4 can silently break things.
No source citations -- Without DK, you get confident answers with no way to verify against official docs.

Today, the main value is accurate answers to day-to-day Percona questions. Increasingly, AI tools are also writing install scripts, playbooks, and configs that go straight to real infrastructure -- and that's where stale training data does the most damage.

Quick start

macOS / Linux:

curl -fsSL https://raw.githubusercontent.com/Percona-Lab/percona-dk/main/install-percona-dk | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/Percona-Lab/percona-dk/main/install-percona-dk.ps1 | iex

During install you'll be asked to choose a mode:

Shared instance (recommended for Percona employees on VPN) - installs a tiny local bridge so Claude Desktop / Claude Code / Cursor / Windsurf can talk to the shared Percona DK on sherpa. Install finishes in under a minute. Docs, blog, and forums stay current automatically because sherpa refreshes daily. Requires Percona VPN to get results; off-VPN the connector stays active and tool calls return a "VPN required" message instead of breaking the client.
Full local install - clones Percona doc repos, builds a local ChromaDB index, runs a local MCP server against it. Works completely offline once indexed. First-run takes minutes to hours depending on how many sources are selected. Use this if you don't have VPN access, want offline capability, or want to customize which repos are indexed.

The installer handles everything for the mode you pick:

Installs uv if needed (downloads Python 3.12 automatically - no system Python required)
Clones the repo to ~/percona-dk and creates an isolated virtual environment
Auto-configures Claude Desktop, Claude Code, Cursor, and Windsurf
(Local mode only) Walks you through which doc repos to index, runs initial ingestion, sets up auto-refresh

Safe to re-run - detects existing installs, preserves your config, and pre-selects the mode you chose previously.

Note: In Percona's Claude Teams workspace, user-added custom connectors are disabled at the org level. That means you cannot point Claude Desktop or claude.ai directly at http://sherpa.tp.int.percona.com:8402/sse via Settings > Connectors. The curl installer above is the supported path because it writes the MCP entry to Claude Desktop / Claude Code's local config file, which is not subject to the custom-connector policy.

What it does

Percona doc repos (GitHub)
        │
        ▼
  ┌─────────────┐
  │  Ingestion   │  Clone repos → parse Markdown/RST → chunk by heading → embed locally
  └──────┬──────┘
         ▼
  ┌─────────────┐
  │  ChromaDB    │  Local vector store (all-MiniLM-L6-v2 embeddings)
  └──────┬──────┘
         │
    ┌────┴────┐
    ▼         ▼
┌───────┐ ┌───────┐
│  API  │ │  MCP  │
│Server │ │Server │
└───────┘ └───────┘

Ingestion pipeline — clones Percona doc repos, parses Markdown and reStructuredText sections, embeds locally (no API keys needed)
REST API — POST /search, GET /document/{repo}/{path}, GET /health, GET /stats
MCP server — search_percona_docs and get_percona_doc tools for any MCP-compatible client

Content sources

percona-dk indexes three kinds of content into a single searchable corpus:

Source	What it covers	Refresh
Official docs (GitHub)	22 Percona product doc repos across 7 stacks	Incremental, daily
Community blog (percona.community/blog)	~280 long-form posts: deep dives, tuning walkthroughs, release overviews	Daily, via sitemap `lastmod`
Percona forums (forums.percona.com)	~16,000 Discourse topics: real-world Q&A, troubleshooting threads, configuration discussions	Daily, via sitemap `lastmod`

Blog and forum ingestion can be toggled independently in .env (INGEST_BLOG=true, INGEST_FORUM=true). For existing installs, re-run percona-dk-ingest after pulling the latest release to pick them up.

Available repos

The installer lets you choose which stacks to index. All repos are public Percona GitHub repositories.

Stack	Repo	Product
MySQL	`percona/psmysql-docs`	Percona Server for MySQL
MySQL	`percona/pxc-docs`	Percona XtraDB Cluster
MySQL	`percona/pxb-docs`	Percona XtraBackup
MySQL	`percona/pdmysql-docs`	Percona Distribution for MySQL
MySQL	`percona/ps-binlog-server-docs`	Percona Binlog Server
MongoDB	`percona/psmdb-docs`	Percona Server for MongoDB
MongoDB	`percona/pbm-docs`	Percona Backup for MongoDB
MongoDB	`percona/pcsm-docs`	Percona ClusterSync for MongoDB
PostgreSQL	`percona/postgresql-docs`	Percona Distribution for PostgreSQL
PostgreSQL	`percona/pg_tde`	pg_tde (Transparent Data Encryption)
PostgreSQL	`percona/pgsm-docs`	pg_stat_monitor
Valkey	`percona/percona-valkey-doc`	Percona Packages for Valkey
Kubernetes Operators	`percona/k8sps-docs`	Operator for MySQL
Kubernetes Operators	`percona/k8spxc-docs`	Operator for PXC
Kubernetes Operators	`percona/k8spsmdb-docs`	Operator for MongoDB
Kubernetes Operators	`percona/k8spg-docs`	Operator for PostgreSQL
OpenEverest	`openeverest/everest-doc`	OpenEverest DBaaS Platform
Tools and PMM	`percona/pmm-doc`	Percona Monitoring and Management
Tools and PMM	`percona/pmm_dump_docs`	PMM Dump
Tools and PMM	`percona/proxysql-admin-tool-doc`	ProxySQL Admin Tool
Tools and PMM	`percona/percona-toolkit`	Percona Toolkit (RST docs)
Tools and PMM	`percona/repo-config-docs`	Percona Software Repositories

The MySQL stack and Tools are indexed by default. MongoDB, PostgreSQL, Kubernetes Operators, and OpenEverest are opt-in during installation.

Adding repos after installation

Re-run the installer — it will show your current selection with existing repos pre-ticked, detect the change, and prompt you to re-index:

curl -fsSL https://raw.githubusercontent.com/Percona-Lab/percona-dk/main/install-percona-dk | bash

Or edit .env directly and re-run ingestion:

# Edit ~/percona-dk/.env, then:
DOTENV_PATH=~/percona-dk/.env ~/percona-dk/.venv/bin/percona-dk-ingest

Manual MCP configuration

If you need to configure an MCP client manually, use:

{
  "mcpServers": {
    "percona-dk": {
      "command": "/path/to/percona-dk/.venv/bin/python",
      "args": ["-m", "percona_dk.mcp_server"],
      "env": { "DOTENV_PATH": "/path/to/percona-dk/.env" }
    }
  }
}

For Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or ~/.config/Claude/claude_desktop_config.json (Linux).

For Claude Code: ~/.claude/settings.json.

For GitHub Copilot (VS Code), add to .vscode/mcp.json:

{
  "servers": {
    "percona-dk": {
      "command": "/path/to/percona-dk/.venv/bin/percona-dk-mcp"
    }
  }
}

Then switch to Agent Mode in Copilot Chat to use MCP tools.

For OpenAI Codex CLI, add to ~/.codex/config.toml:

[mcp_servers.percona-dk]
command = ["/path/to/percona-dk/.venv/bin/percona-dk-mcp"]

Keeping docs up to date

The MCP server automatically syncs docs in the background. On each startup, it checks when the last sync ran. If it's been more than 7 days (configurable), it pulls the latest from GitHub and re-embeds only the files that changed — all in the background so the server starts immediately. Existing data stays searchable during the sync.

Configure the refresh interval in .env:

REFRESH_DAYS=7   # check every 7 days (default)
REFRESH_DAYS=1   # check daily
REFRESH_DAYS=0   # disable auto-refresh

You can also refresh manually at any time:

DOTENV_PATH=~/percona-dk/.env ~/percona-dk/.venv/bin/percona-dk-ingest

REST API

# Start the API server
~/percona-dk/.venv/bin/percona-dk-server
# Open http://localhost:8000/docs for Swagger UI

curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "How to configure PMM for MySQL monitoring", "top_k": 5}'

How it works

Ingestion (percona-dk-ingest): Shallow-clones each doc repo, walks all .md and .rst files, splits them at h2/h3 heading boundaries into chunks of ~500-800 tokens each. Metadata includes source repo, file path, heading hierarchy, and a constructed docs.percona.com URL.
Embedding: ChromaDB's built-in all-MiniLM-L6-v2 model generates 384-dimensional embeddings locally. No external API calls.
Search: Queries are embedded with the same model and matched against the corpus using cosine similarity. Results include the original Markdown text, source metadata, and relevance scores.
Repo suggestions: If a search returns weak results and the query matches keywords from a repo that isn't indexed, the MCP server suggests adding that repo.

Project structure

percona-dk/
├── src/percona_dk/
│   ├── ingest.py          # Ingestion pipeline
│   ├── server.py          # FastAPI REST server
│   ├── mcp_server.py      # MCP server for AI tools
│   ├── repo_registry.py   # Known repos + suggestion logic
│   └── version_check.py   # Update notifications
├── install-percona-dk     # One-line installer
├── pyproject.toml
└── .env.example

Future direction

Potential next steps:

Optimized for AI-assisted ops -- Better tool descriptions and response formats as AI-generated install scripts, playbooks, and configs become standard workflow
Better embeddings — swap in a larger model for improved search quality
Version-aware search — filter results by product version (8.0 vs 8.4)
Source-type filtering — let clients restrict searches to docs-only, community-only, or weight them differently
Additional sources — knowledge base articles, release notes archives, conference talk transcripts
Hosted service — centrally hosted API for team-wide or customer access

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
docs		docs
scripts		scripts
src/percona_dk		src/percona_dk
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deploy.sh		deploy.sh
install-percona-dk		install-percona-dk
install-percona-dk.ps1		install-percona-dk.ps1
installer.py		installer.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Percona Developer Knowledge (percona-dk)

Why this matters

Supported tools

Why

Quick start

What it does

Content sources

Available repos

Adding repos after installation

Manual MCP configuration

Keeping docs up to date

REST API

How it works

Project structure

Future direction

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Percona Developer Knowledge (percona-dk)

Why this matters

Supported tools

Why

Quick start

What it does

Content sources

Available repos

Adding repos after installation

Manual MCP configuration

Keeping docs up to date

REST API

How it works

Project structure

Future direction

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages