diff --git a/website/docs/mcp-server/available-tools.md b/website/docs/mcp-server/available-tools.md new file mode 100644 index 00000000..5c75e609 --- /dev/null +++ b/website/docs/mcp-server/available-tools.md @@ -0,0 +1,167 @@ +--- +id: available-tools +title: Available Tools +--- + +In MCP, a “tool” is a simple action you can call programmatically: it takes clear inputs and returns structured outputs. + +Litmus MCP tools map to common chaos engineering workflows, managing experiments, monitoring runs, connecting infrastructures, organizing environments, defining resilience probes, and discovering faults and analytics, so assistants and automations can perform these tasks reliably. + +Use this page as a practical reference. Each section explains what a tool does, when to use it, typical inputs, and the kind of output you can expect. + +## Overview of tool categories + +- Experiment Management: create visibility into experiments and run or stop them. +- Execution Monitoring: see execution history and drill into run details and logs. +- Infrastructure Management: register and inspect Kubernetes infrastructures. +- Environment Organization: group experiments and resources by environment. +- Resilience Validation: define probes to validate steady state and SLOs. +- Discovery & Analytics: explore available faults and review platform statistics. + +Below is the full list of 17 tools, organized by category. + +## Experiment Management + +These tools help you find, inspect, run, and stop chaos experiments. + +### list_chaos_experiments +List all chaos experiments with optional filtering. +- What it does: Returns a list of experiments. You can filter by project, environment, tags, or name. +- When to use: To see which experiments exist before selecting one to run or inspect. +- Typical input: Project ID or Name, optional filters (environment, labels, search text), pagination. +- Output: A paginated list of experiments with key fields like name, ID, environment, and status. + +### get_chaos_experiment +Get detailed information about a specific chaos experiment. +- What it does: Fetches full details for a single experiment. +- When to use: To review experiment structure, faults used, probes, and configuration before running it. +- Typical input: Experiment ID (or name + project/environment context). +- Output: Experiment spec including faults, probes, parameters, schedules, and metadata. + +### run_chaos_experiment +Execute a chaos experiment immediately (on-demand run). +- What it does: Triggers an on-demand run of the selected experiment. +- When to use: To start a test now (outside of any scheduled cadence) for debugging or validation. +- Typical input: Experiment ID and optional overrides (variables, run labels, dry-run flag if supported). +- Output: A run ID (or execution reference) you can use to monitor progress. + +### stop_chaos_experiment +Stop an in-progress chaos experiment run. +- What it does: Attempts to stop an in-progress experiment run. +- When to use: If a test must be halted due to impact, misconfiguration, or a time limit. +- Typical input: Experiment ID or Run ID. +- Output: Confirmation that the stop request was accepted; subsequent run status should show as stopped/terminated. + +## Execution Monitoring + +These tools help you track experiment execution over time, and inspect an individual run in depth. + +### list_experiment_runs +List experiment execution history with flexible filters. +- What it does: Lists runs across experiments, with filters such as experiment, environment, status, or time range. +- When to use: To review what ran recently, identify failed runs, or audit changes over time. +- Typical input: Experiment ID (optional), status filters (Succeeded/Failed/Running), time window, pagination. +- Output: A list of runs with IDs, timestamps, duration, status, and basic metadata. + +### get_experiment_run_details +Get detailed information about a single run, including timeline, logs, and probe results. +- What it does: Shows a single run’s timeline, step status, logs, and probe results. +- When to use: For debugging failures, verifying probe outcomes, or sharing evidence of success. +- Typical input: Run ID. +- Output: Detailed run record including events, steps, logs, artifacts, and final status. + +## Infrastructure Management + +Use these tools to manage and view the Kubernetes infrastructures where experiments run. + +### list_chaos_infrastructures +List all registered infrastructures (for example, Kubernetes clusters/agents). +- What it does: Returns all infrastructures registered to the project (for example, Kubernetes clusters/agents). +- When to use: To confirm which clusters are connected and healthy. +- Typical input: Project ID or Name, optional filters (status, type), pagination. +- Output: A list of infrastructures with IDs, names, types, connection status, and last heartbeat. + +### get_infrastructure_details +Get detailed information about a specific infrastructure. +- What it does: Shows full details about a specific infrastructure. +- When to use: To review configuration, connected namespaces, resource quotas, and health. +- Typical input: Infrastructure ID. +- Output: Detailed infrastructure profile including metadata, status, capabilities, and version info. + +### register_chaos_infrastructure +Register a new Kubernetes infrastructure to run experiments. +- What it does: Starts the registration/handshake for a new Kubernetes cluster or agent. +- When to use: When onboarding a new cluster to run chaos experiments. +- Typical input: Project context, cluster name, and registration parameters. You may receive a token or manifest to apply. +- Output: Registration info and next steps (for example, a YAML manifest to install or a token to use with the agent). + +## Environment Organization + +Organize experiments and resources into environments (for example, dev, staging, prod). + +### list_environments +List all environments defined in the project. +- What it does: Lists environments defined in the project. +- When to use: To pick the right environment for creating or running experiments. +- Typical input: Project ID or Name, pagination. +- Output: A list of environments with IDs, names, and basic metadata. + +### create_environment +Create a new environment for organizing experiments and resources. +- What it does: Creates a new environment grouping. +- When to use: When you need a separate space for a team, app, or lifecycle stage. +- Typical input: Environment name, description, optional tags/labels. +- Output: The newly created environment with its ID and details. + +## Resilience Validation + +Probes validate steady state or desired outcomes before, during, and after experiments. + +### list_resilience_probes +List all configured resilience probes. +- What it does: Lists probes available in the project or environment. +- When to use: To see what checks exist and reuse them across experiments. +- Typical input: Optional filters like environment, probe type (HTTP, CMD, K8s, Prometheus), pagination. +- Output: A list of probes with IDs, names, types, and brief specs. + +### create_resilience_probe +Create a new probe (HTTP, CMD, K8s, or Prometheus) for resilience validation. +- What it does: Creates a new probe definition to validate resilience signals. +- When to use: To add new SLO checks or steady-state validations. +- Typical input: Probe name, type, and spec: + - HTTP: URL, method, headers, expected status/body. + - CMD: Command, arguments, timeout, expected exit code. + - K8s: Resource query (pods/deployments), conditions, namespace. + - Prometheus: Query, comparison operator, threshold, evaluation window. +- Output: The created probe with ID and full spec. + +## Discovery & Analytics + +Explore the chaos library and get high-level insights about usage and outcomes. + +### list_chaos_hubs +List available ChaosHubs that provide faults and experiments. +- What it does: Lists connected ChaosHubs that provide faults/experiments. +- When to use: To discover which hubs are available (public or private) and browse their content. +- Typical input: Optional filters, pagination. +- Output: A list of hubs with IDs, names, types, and availability. + +### get_chaos_faults +Browse available chaos faults from connected hubs. +- What it does: Returns chaos faults available from hubs, with metadata like category, platform, and parameters. +- When to use: To select a fault to add to an experiment. +- Typical input: Hub ID (optional), search query, categories, pagination. +- Output: A list of faults with names, descriptions, supported platforms, and input parameters. + +### get_experiment_statistics +Get comprehensive platform-level statistics and recent activity. +- What it does: Provides aggregate stats such as number of experiments, runs, success/failure rates, and recent activity. +- When to use: For reporting, governance, and tracking adoption over time. +- Typical input: Optional time range, environment, or project filters. +- Output: Summary metrics, charts-ready aggregates, and counts. + +## Learn more + +- [Installation](./installation) +- [Example Interactions](./examples) +- [Troubleshooting](./troubleshooting) diff --git a/website/docs/mcp-server/examples.md b/website/docs/mcp-server/examples.md new file mode 100644 index 00000000..83f85f34 --- /dev/null +++ b/website/docs/mcp-server/examples.md @@ -0,0 +1,103 @@ +--- +id: examples +title: Example Interactions +--- + +In this documentation, you can find copy-paste-ready interactions you can perform via the MCP server. Each example pairs a natural language prompt with the underlying tool calls and a sample response so you can replicate the workflow quickly. + +If you’re new to the tool surface, see `mcp-server/available-tools.md` for a complete reference of capabilities and parameters. + +## Quick Start + +A common end-to-end flow looks like this: + +1. List experiments → pick one to run. +2. Run an experiment → capture the Run ID. +3. Monitor the run → check steps, logs, and probe results. +4. (Optional) Stop the run if needed. + +Below, you’ll find detailed examples for these and more scenarios. + +## Sample Prompts + +- Prompt + ```text + Show me available chaos experiments in the staging environment that target Kubernetes. + ``` + It will: List chaos experiments filtered by environment and platform so you can choose one to run. + +- Prompt + ```text + Run experiment "pod-delete-basic" now in staging and return the run ID. + ``` + It will: Trigger an on-demand run of the chosen experiment and return the Run ID. + +- Prompt + ```text + Show me the latest status, timeline, and probe results for run ID . + ``` + It will: Retrieve detailed run information including step timeline and probe outcomes. + +- Prompt + ```text + Stop the currently running run and confirm the termination. + ``` + It will: Attempt to stop the in-progress run and report acceptance. + +- Prompt + ```text + List all registered infrastructures and show which ones are healthy. + ``` + It will: Return the registered infrastructures with their connection/health status. + +- Prompt + ```text + Onboard a new Kubernetes cluster named "edge-lab" and provide the registration steps. + ``` + It will: Initiate registration and return the manifest or token with next steps. + +- Prompt + ```text + Create an HTTP probe that checks GET https://myapp.example.com/health returns 200 in under 2s. + ``` + It will: Create a reusable HTTP probe definition for steady-state validation. + +- Prompt + ```text + List all probes so I can attach one to my next experiment. + ``` + It will: List available resilience probes with IDs and brief specs. + +- Prompt + ```text + Show me available ChaosHubs and then list Kubernetes pod-level faults. + ``` + It will: List connected hubs and then fetch faults filtered by platform/category. + +- Prompt + ```text + Create a new environment called "chaos-lab" for ad-hoc testing. + ``` + It will: Create a new environment grouping that you can target in experiments. + +- Prompt + ```text + List environments so I can verify "chaos-lab" exists. + ``` + It will: List all environments with their IDs and names. + +## Tips and Good Practices + +- Start broad with listing tools (`list_chaos_experiments`, `list_experiment_runs`, `list_chaos_infrastructures`, `list_environments`) before drilling down. +- Prefer IDs over names for precision when running or stopping experiments. +- After `run_chaos_experiment`, immediately capture the `runId` to monitor or stop it later. +- Reuse probes across experiments to standardize resilience checks. +- Keep filters small and focused to reduce noise in large projects. + +For detailed parameter schemas and additional examples, see `mcp-server/available-tools.md`. + +## Learn more + +- [Installation](./installation) +- [Available Tools](./available-tools) +- [Troubleshooting](./troubleshooting) diff --git a/website/docs/mcp-server/installation.md b/website/docs/mcp-server/installation.md new file mode 100644 index 00000000..511003d6 --- /dev/null +++ b/website/docs/mcp-server/installation.md @@ -0,0 +1,102 @@ +--- +id: installation +title: Installation +--- + +You can install Litmus MCP Server from source, via `go install`, or using Docker. Refer to the repository for the most up-to-date commands. + +## From Source + +```bash +# Clone the repository +git clone https://github.com/litmuschaos/litmus-mcp-server.git +cd litmus-mcp-server + +# Build the binary +make build + +# Or install directly +make install +``` + +## Using Go Install + +```bash +go install github.com/litmuschaos/litmus-mcp-server@latest +``` + +## Using Docker + +```bash +# Build the Docker image +make docker-build + +# Run with Docker +docker run --rm -it \ + -e CHAOS_CENTER_ENDPOINT=http://your-chaos-center:8080 \ + -e LITMUS_PROJECT_ID=your-project-id \ + -e LITMUS_ACCESS_TOKEN=your-token \ + litmuschaos-mcp-server:latest +``` + +## Configuration + +Configure Litmus MCP Server using environment variables. + +### Environment Variables + +```bash +# Required Configuration +export CHAOS_CENTER_ENDPOINT=http://your-chaos-center:8080 +export LITMUS_PROJECT_ID=your-project-id +export LITMUS_ACCESS_TOKEN=your-access-token + +# Optional Defaults +export DEFAULT_INFRA_ID=your-default-infrastructure-id +export DEFAULT_ENVIRONMENT_ID=production +``` + +### Getting Your Credentials + +1. **Chaos Center Endpoint**: URL of your LitmusChaos installation +2. **Project ID**: Found in Chaos Center project settings +3. **Access Token**: Generate from Chaos Center → Settings → Access Tokens + +## Usage + +You can run Litmus MCP Server standalone or integrate it with Claude Desktop via the MCP configuration. + +### With Claude Desktop + +Add to your Claude Desktop MCP configuration: + +```json +{ + "mcpServers": { + "litmuschaos": { + "command": "/path/to/litmuschaos-mcp-server", + "env": { + "CHAOS_CENTER_ENDPOINT": "http://localhost:8080", + "LITMUS_PROJECT_ID": "your-project-id", + "LITMUS_ACCESS_TOKEN": "your-token" + } + } + } +} +``` + +### Standalone Usage + +```bash +# Using environment variables +./bin/litmuschaos-mcp-server + +# Or with make +make run +``` + +## Learn more + +- [Available Tools](./available-tools) +- [Example Interactions](./examples) +- [Troubleshooting](./troubleshooting) diff --git a/website/docs/mcp-server/overview.md b/website/docs/mcp-server/overview.md new file mode 100644 index 00000000..c64275e1 --- /dev/null +++ b/website/docs/mcp-server/overview.md @@ -0,0 +1,102 @@ +--- +id: overview +title: Overview +--- + +Litmus MCP Server is a Model Context Protocol (MCP) server for LitmusChaos 3.x that lets AI assistants interact with your chaos engineering platform via natural language. + +- Built in Go +- Works with LitmusChaos Chaos Center 3.x +- Manage experiments, infrastructures, environments, and resilience probes + +See the [GitHub repository](https://github.com/litmuschaos/litmus-mcp-server) for more details. + +## Prerequisites + +- Go 1.21+ +- Access to a LitmusChaos 3.x Chaos Center +- Valid project credentials + +## Key Features + +Use these MCP tools to manage chaos with plain language: find and run experiments, track results, and manage infrastructure, environments, probes, and hubs. + +### Chaos Experiment Management + +The MCP server exposes tools to help you discover and operate chaos experiments through natural language. + +- List and describe available chaos experiments in a project +- Execute experiments on demand or via cron-like schedules +- Stop or abort running experiments with granular control +- Provide dry-run style validations where supported by the backend + +Use cases: quickly preview experiment details, trigger a one-off chaos run, or halt an experiment that is impacting a sensitive window. + +### Infrastructure Operations + +Operate LitmusChaos infrastructures (formerly agents/chaos delegates) programmatically via the MCP server. + +- List and get infrastructure details, including connection and health status +- Monitor infrastructure heartbeat, last seen time, and readiness +- Generate installation manifests tailored to your environment +- Support for both namespace-scoped and cluster-scoped deployments + +Use cases: verify delegate health, fetch installation YAML, or confirm whether an infra is cluster-wide. + +### Environment Organization + +Organize your resources using environments to separate PROD and NON_PROD workloads and operations. + +- Create and manage environments (for example, PROD and NON_PROD) +- Associate infrastructures with specific environments +- Filter experiments and operations based on environment context + +Use cases: keep production chaos separate from staging, and apply environment-aware policies and filters. + +### Experiment Execution Tracking + +Gain visibility into experiment runs and their outcomes directly from your AI assistant. + +- Retrieve detailed run history with status, duration, and timeline +- Monitor active executions in near real time +- Track fault-level success/failure signals +- View resiliency score calculations and contributing factors + +Use cases: audit past runs, inspect an in-progress execution, or report the resiliency trend to stakeholders. + +### Resilience Probes + +Probes validate steady-state behavior and success criteria during chaos runs. + +- Built-in probe types: HTTP, Command, Kubernetes, and Prometheus +- Plug-and-play probe architecture for easy composition +- Steady-state and post-injection validations during experiments + +Use cases: verify service health with HTTP checks, run diagnostic commands, or evaluate Prometheus metrics as SLOs. + +### ChaosHub Integration + +Discover and manage chaos faults from one or more hubs. + +- Browse available chaos faults and their documentation +- Support multiple hubs (Git-backed and remote hubs) +- Categorization and search to quickly find relevant faults + +Use cases: explore new faults to adopt, compare hub versions, or locate a fault by category. + +### Statistics and Analytics + +Get aggregated views across experiments and infrastructures to understand overall resilience. + +- Project-wide experiment and infrastructure statistics +- Resiliency score distributions over time or by environment +- Run status breakdowns and failure modes + +Use cases: track adoption, identify flaky faults, and quantify improvements to resilience. + +## Learn more + +- [Installation](./installation) +- [Available Tools](./available-tools) +- [Example Interactions](./examples) +- [Troubleshooting](./troubleshooting) diff --git a/website/docs/mcp-server/troubleshooting.md b/website/docs/mcp-server/troubleshooting.md new file mode 100644 index 00000000..6bc899dd --- /dev/null +++ b/website/docs/mcp-server/troubleshooting.md @@ -0,0 +1,20 @@ +--- +id: troubleshooting +title: Troubleshooting +--- + +Common issues and resolutions. + +- Authentication failures: verify `LITMUS_ACCESS_TOKEN` is valid and not expired. +- Project not found: confirm `LITMUS_PROJECT_ID` from Chaos Center settings. +- Endpoint connectivity: ensure `CHAOS_CENTER_ENDPOINT` is reachable from where the server runs. +- Missing defaults: set `DEFAULT_INFRA_ID` or pass infrastructure explicitly when executing. +- Claude Desktop not invoking: verify MCP config path and that the binary path is correct. + +If issues persist, see the GitHub README Troubleshooting section or open an issue. + +## Learn more + +- [Installation](./installation) +- [Available Tools](./available-tools) +- [Example Interactions](./examples) diff --git a/website/sidebars.js b/website/sidebars.js index 3c0e9847..3591ee94 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -52,6 +52,20 @@ module.exports = { 'getting-started/resources' ] }, + { + type: 'category', + label: 'Litmus MCP Server', + className: 'category-as-header', + collapsed: false, + collapsible: false, + items: [ + 'mcp-server/overview', + 'mcp-server/installation', + 'mcp-server/available-tools', + 'mcp-server/examples', + 'mcp-server/troubleshooting' + ] + }, { type: 'category', label: 'Architecture',