Skip to content

Latest commit

 

History

History
265 lines (186 loc) · 9.44 KB

File metadata and controls

265 lines (186 loc) · 9.44 KB

Configuration

The Docs MCP Server uses a unified configuration system that aggregates settings from multiple sources, validating them against a strict schema. This ensures consistency whether you are running the server via CLI, Docker, or as a library.

Configuration File

By default, configuration is stored in your system's preferences directory:

  • macOS: ~/Library/Preferences/docs-mcp-server/config.yaml
  • Linux: ~/.config/docs-mcp-server/config.yaml
  • Windows: %APPDATA%\docs-mcp-server\config.yaml

Example config.yaml:

app:
  storePath: ~/.docs-mcp-server
  telemetryEnabled: true
  embeddingModel: text-embedding-3-small

scraper:
  maxPages: 1000
  maxDepth: 3
  document:
    maxSize: 10485760  # 10MB

splitter:
  preferredChunkSize: 1500
  maxChunkSize: 5000

The server automatically updates this file on startup with new defaults.

Using an Explicit Config File

You can specify a custom config file with --config or DOCS_MCP_CONFIG:

docs-mcp-server --config /path/to/config.yaml

Note: Explicit config files are treated as read-only. The server will not modify them.

Overriding Configuration

Configuration values are merged from multiple sources, with later sources taking precedence:

  1. Defaults (lowest priority)
  2. Config File
  3. Environment Variables
  4. CLI Arguments (highest priority)

Environment Variables

Any configuration setting can be overridden via environment variables using the naming convention:

DOCS_MCP_<SECTION>_<SETTING>

Rules:

  • Convert camelCase to UPPER_SNAKE_CASE
  • Join nested paths with underscores

Examples:

# Override scraper settings
export DOCS_MCP_SCRAPER_MAX_PAGES=2000
export DOCS_MCP_SCRAPER_DOCUMENT_MAX_SIZE=52428800

# Override splitter settings
export DOCS_MCP_SPLITTER_PREFERRED_CHUNK_SIZE=2000

# Override app settings
export DOCS_MCP_APP_TELEMETRY_ENABLED=false

Some settings also have legacy aliases for convenience:

Setting Alias
server.ports.default PORT
server.host HOST

CLI Arguments

Common settings have dedicated CLI flags:

docs-mcp-server --port 8080 --host 0.0.0.0
docs-mcp-server --store-path /data/docs --read-only

CLI Configuration Commands

Manage configuration directly from the command line:

# View current configuration (JSON format)
docs-mcp-server config

# View current configuration (YAML format)
docs-mcp-server config --yaml

# Get a specific value
docs-mcp-server config get scraper.maxPages
# Output: 1000

# Get a nested object
docs-mcp-server config get scraper.fetcher
# Output: { "maxRetries": 6, ... }

# Set a value (persists to config file)
docs-mcp-server config set scraper.maxPages 500
# Output: Updated scraper.maxPages = 500

Note: config set only modifies the system default configuration file. If you specify --config, the file is treated as read-only.


Configuration Reference

App (app)

General application settings.

Option Default Description
storePath ~/.docs-mcp-server Directory for storing databases and logs.
telemetryEnabled true Enable anonymous usage telemetry.
readOnly false Prevent modification of data (scraping/indexing).
embeddingModel text-embedding-3-small Model to use for vector embeddings.

Server (server)

Settings for the API and MCP servers.

Option Default Description
protocol auto Server protocol (stdio, http, or auto).
host 127.0.0.1 Host interface to bind to.
heartbeatMs 30000 MCP protocol heartbeat interval (ms).
ports.default 6280 Default port for the main server.
ports.worker 8080 Port for the background worker service.
ports.mcp 6280 Port for the specific MCP interface.
ports.web 6281 Port for the web dashboard.

Authentication (auth)

Security settings for the HTTP server.

Option Default Description
enabled false Enable JWT authentication.
issuerUrl - OIDC Issuer URL (e.g., Clerk, Auth0).
audience - Expected JWT audience claim.

Scraper (scraper)

Settings controlling the web scraping behavior.

Option Default Description
maxPages 1000 Maximum number of pages to crawl per job.
maxDepth 3 Maximum link depth to traverse.
maxConcurrency 3 Number of concurrent page fetches.
pageTimeoutMs 5000 Timeout for a single page load (ms).
browserTimeoutMs 30000 Timeout for the browser instance (ms).
fetcher.maxRetries 6 Number of retries for failed requests.
fetcher.baseDelayMs 1000 Initial delay for exponential backoff (ms).
document.maxSize 10485760 Maximum size (bytes) for PDF/Office documents.

Note: Scraper settings are often overridden per-job via CLI arguments like --max-pages.

Migration Note: In versions prior to 1.37, document.maxSize was a top-level setting. It has been moved to scraper.document.maxSize. Update your config files accordingly.

GitHub Authentication

Environment variables for authenticating with GitHub when scraping private repositories.

Env Var Description
GITHUB_TOKEN GitHub personal access token or fine-grained token. Used for private repo access and higher rate limits.
GH_TOKEN Alternative to GITHUB_TOKEN. Used if GITHUB_TOKEN is not set.

Authentication Resolution Order:

  1. Explicit Authorization header passed in scraper options
  2. GITHUB_TOKEN environment variable
  3. GH_TOKEN environment variable
  4. Local gh CLI authentication (via gh auth token)

If no authentication is available, public repositories are still accessible but with lower rate limits (60 requests/hour vs 5,000 authenticated).

Splitter (splitter)

Settings for chunking text for vector search.

Option Default Description
minChunkSize 500 Minimum characters per chunk body. Chunks below this threshold are merged with adjacent chunks by the greedy optimizer.
preferredChunkSize 1500 Soft target for chunk body size in characters. The greedy optimizer splits when combining two chunks would exceed this value, provided both sides are already above minChunkSize.
maxChunkSize 5000 Hard upper limit for chunk body size in characters. No chunk body will exceed this value.

Note: These size limits apply to the text body of each chunk. Before embedding, a small metadata header (page title, URL, section path) is prepended to each chunk, adding to the total character count sent to the embedding model. Because characters are not tokens, the actual token count depends on your embedding model's tokenizer. If your model has a small context window (e.g., some local models), consider lowering maxChunkSize to leave headroom for metadata and token expansion.

Embeddings (embeddings)

Settings for the vector embedding generation.

Detailed Guide: See Embedding Model Configuration for provider-specific setup (OpenAI, Ollama, Gemini, etc.).

Option Default Description
batchSize 100 Number of chunks to embed in one request.
vectorDimension 1536 Dimension of the vector space (must match model).

Storage (storage)

Settings for the data storage backend.

Option Default Description
provider sqlite Storage backend to use (sqlite or supabase).
supabase.url - Supabase project URL. Env: SUPABASE_URL.
supabase.serviceRoleKey - Supabase service role key. Env: SUPABASE_SERVICE_ROLE_KEY.
supabase.connectionString - PostgreSQL connection string. Env: DATABASE_URL.
supabase.schema - PostgreSQL schema name. When set, all tables are created in this schema instead of public. Env: SUPABASE_SCHEMA.

Example: Supabase with custom schema

export DOCS_MCP_STORAGE_PROVIDER=supabase
export DATABASE_URL="postgresql://user:[email protected]:5432/postgres"
export SUPABASE_SCHEMA=docs_prod

When schema is set, the server automatically creates the schema if it doesn't exist and routes all queries to it via search_path. This allows multiple deployments to share the same database (e.g., docs_prod and docs_staging).

Database (db)

Internal database settings.

Option Default Description
migrationMaxRetries 5 Retries for database migrations on startup.

Assembly (assembly)

Settings for reassembling search results.

Option Default Description
maxChunkDistance 3 Maximum sort_order difference to merge chunks.
maxParentChainDepth 10 Maximum depth for parent context traversal.
childLimit 3 Maximum number of child chunks to include.
precedingSiblingsLimit 1 Number of preceding sibling chunks to include.
subsequentSiblingsLimit 2 Number of subsequent sibling chunks to include.