This document provides complete API reference for the MCP-Powered Agentic RAG system FastAPI server.
Base URL: http://localhost:8000
API Version: 1.0.0
Currently, the API does not require authentication. For production deployments, consider enabling token-based authentication in mcp_config.yaml.
{
"status": "success",
"data": {},
"timestamp": "2024-01-15T10:30:00Z"
}{
"detail": "Error message describing what went wrong"
}Returns general information about the API.
Request:
GET /Response:
{
"name": "MCP-Powered Agentic RAG",
"version": "1.0.0",
"description": "Local agentic RAG system using Model Context Protocol",
"endpoints": {
"query": "/query",
"search": "/search",
"add_documents": "/documents",
"stats": "/stats",
"health": "/health"
}
}Checks the health status of the API and vector store.
Request:
GET /healthResponse (Success - 200):
{
"status": "healthy",
"vector_store_documents": 42
}Response (Error - 503):
{
"detail": "Service not initialized"
}Query the RAG agent with context retrieval and LLM reasoning.
Request:
POST /query
Content-Type: application/json
{
"query": "What is the Model Context Protocol?",
"use_context": true,
"n_results": 3
}Request Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| query | string | Yes | - | The user's question |
| use_context | boolean | No | true | Whether to retrieve context documents |
| n_results | integer | No | 3 | Number of documents to retrieve (1-10) |
Response (Success - 200):
{
"query": "What is the Model Context Protocol?",
"response": "The Model Context Protocol (MCP) is a standardized way to connect language models to external services...",
"retrieved_documents": [
"MCP enables modular tool use for AI agents...",
"The protocol provides clean separation of concerns..."
],
"context_used": true,
"model": "mistral"
}Response (Error - 500):
{
"detail": "Error calling Ollama: Connection refused"
}Search for documents similar to a query using semantic search.
Request:
POST /search
Content-Type: application/json
{
"query": "vector databases",
"n_results": 5
}Request Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| query | string | Yes | - | Search query |
| n_results | integer | No | 3 | Number of results to return |
Response (Success - 200):
{
"query": "vector databases",
"results": {
"documents": [
"Vector databases like ChromaDB store embeddings...",
"ChromaDB enables efficient semantic search..."
],
"ids": ["doc_1", "doc_2"],
"distances": [0.15, 0.22],
"metadatas": [
{"source": "sample_docs/ai_fundamentals.txt"},
{"source": "sample_docs/ai_fundamentals.txt"}
]
},
"count": 2
}Add new documents to the vector store.
Request:
POST /documents
Content-Type: application/json
{
"documents": [
"Artificial Intelligence is revolutionizing many industries...",
"Deep learning uses neural networks with multiple layers..."
],
"ids": ["doc1", "doc2"],
"metadata": [
{"source": "article1.txt"},
{"source": "article2.txt"}
]
}Request Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
| documents | array | Yes | List of document texts |
| ids | array | No | Document IDs (auto-generated if not provided) |
| metadata | array | No | List of metadata objects for each document |
Response (Success - 200):
{
"status": "success",
"documents_added": 2,
"total_documents": 44
}Get statistics about the vector store and agent configuration.
Request:
GET /statsResponse (Success - 200):
{
"vector_store": {
"collection_name": "documents",
"document_count": 42,
"persist_dir": "./vector_store"
},
"agent_config": {
"model": "mistral",
"ollama_url": "http://localhost:11434",
"max_tokens": 1024,
"temperature": 0.7
}
}Delete all documents from the vector store (destructive operation).
Request:
DELETE /documentsResponse (Success - 200):
{
"status": "success",
"message": "All documents cleared"
}Warning: This operation permanently deletes all documents in the vector store. It cannot be undone without reimporting documents.
| Code | Message | Cause | Solution |
|---|---|---|---|
| 400 | Bad Request | Invalid request format | Check JSON syntax and field types |
| 404 | Not Found | Endpoint doesn't exist | Verify the endpoint URL |
| 500 | Internal Server Error | Server-side error | Check server logs |
| 503 | Service Unavailable | Service not initialized | Restart the server |
| 504 | Gateway Timeout | LLM processing taking too long | Increase timeout or check Ollama |
{
"detail": "Detailed error message explaining what went wrong"
}# 1. Check health
curl http://localhost:8000/health
# 2. Add documents
curl -X POST http://localhost:8000/documents \
-H "Content-Type: application/json" \
-d '{
"documents": [
"Artificial Intelligence enables machines to learn from data."
],
"ids": ["doc1"],
"metadata": [{"source": "article.txt"}]
}'
# 3. Query the agent
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is artificial intelligence?",
"use_context": true,
"n_results": 3
}'
# 4. Get statistics
curl http://localhost:8000/statsimport requests
BASE_URL = "http://localhost:8000"
# Query with context
response = requests.post(
f"{BASE_URL}/query",
json={
"query": "How does RAG work?",
"use_context": True,
"n_results": 5
}
)
result = response.json()
print(f"Answer: {result['response']}")
print(f"Sources: {len(result['retrieved_documents'])} documents")const baseUrl = 'http://localhost:8000';
// Query the agent
async function queryAgent(query) {
const response = await fetch(`${baseUrl}/query`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
query: query,
use_context: true,
n_results: 3
})
});
return await response.json();
}
// Usage
queryAgent('What is MCP?').then(result => {
console.log('Response:', result.response);
console.log('Documents:', result.retrieved_documents);
});# Add documents from a text file
curl -X POST http://localhost:8000/documents \
-H "Content-Type: application/json" \
-d @- << 'EOF'
{
"documents": [
"$(cat document.txt)"
],
"metadata": [
{"source": "document.txt", "date": "2024-01-15"}
]
}
EOFThe API applies rate limiting to prevent abuse. Default limits:
- 60 requests per minute per IP address
Limits can be configured in mcp_config.yaml.
- Context Retrieval: 100-500ms depending on vector store size
- LLM Inference: 2-30s depending on response length (Ollama local inference)
- Total Query Time: 2-40 seconds typically
- Reduce
n_resultsif not needed - Use smaller models for faster inference (e.g., mistral vs llama3-70b)
- Keep vector store collections reasonably sized
- Use
use_context: falsefor quick responses without retrieval
- Error Handling: Always check HTTP status codes and handle errors gracefully
- Timeouts: Set client timeouts to at least 60 seconds for LLM queries
- Document Management: Clean up unused documents periodically
- Monitoring: Track API response times and error rates
- Security: In production, enable authentication and use HTTPS
- Ensure the server has fully started
- Check that ChromaDB and Ollama are accessible
- Verify Ollama is running:
ollama serve - Check Ollama URL is correct in config
- Check Ollama server isn't overloaded
- Reduce
n_resultsparameter - Try a faster model (mistral is faster than llama3)
- Initial release
- Core RAG functionality
- ChromaDB integration
- Ollama support
- FastAPI server
- v1.1.0: Web search integration
- v1.2.0: Multi-modal support
- v2.0.0: Advanced agent capabilities