A local, modular Retrieval-Augmented Generation (RAG) system using the Model Context Protocol (MCP) to connect an LLM to external tools like vector databases and document loaders.
This project implements an agentic RAG system that:
- Retrieves relevant documents from a local vector database (ChromaDB)
- Augments prompts with retrieved context
- Generates informed responses using a local LLM (Ollama)
- Exposes functionality via REST API with FastAPI
| Component | Tool/Library | Details |
|---|---|---|
| Language Model | Ollama | Local LLM inference (mistral, llama3, etc.) |
| Agent Framework | mcp + FastAPI | API server with tool registration |
| RAG Pipeline | LangChain + Custom | Context retrieval and prompt engineering |
| Vector Store | ChromaDB | Local, persistent vector database |
| Embeddings | SentenceTransformers | all-MiniLM-L6-v2 model |
| File Handling | pypdf, python-docx | PDF and document loading |
| Frontend (Optional) | Streamlit | Interactive web UI |
| Environment | Python 3.10+ | virtualenv or Conda |
agentic-rag-mcp/
├── main.py # FastAPI MCP server
├── rag_agent.py # Agent query logic and RAG orchestration
├── mcp_config.yaml # Configuration file
├── requirements.txt # Python dependencies
├── vector_store/ # Persisted ChromaDB vector store
├── data/
│ └── sample_docs/ # Sample documents for ingestion
└── tools/
└── chromadb_tool.py # Vector search tool implementation
cd agentic-rag-mcp
python -m venv .venv
# On Windows
.venv\Scripts\activate
# On macOS/Linux
source .venv/bin/activatepip install -U pip
pip install -r requirements.txtDownload and install Ollama from the official website.
Start the Ollama server:
# On the system terminal (not in virtual environment)
ollama serveIn another terminal, pull a model:
ollama pull mistral # Recommended for RAG
# or
ollama pull llama3Verify the server is running:
curl http://localhost:11434/api/tagsRun the interactive chat loop:
python rag_agent.pyThis will:
- Load sample documents into the vector store
- Start an interactive chat where you can ask questions
- The agent will retrieve relevant documents and generate answers
Example interaction:
You: What is MCP?
Agent: The Model Context Protocol (MCP) enables modular tool use for AI agents by providing a standardized way to connect language models to external services...
[Used 2 retrieved documents as context]
Start the FastAPI MCP server:
python main.pyThe server will be available at: http://localhost:8000
GET /healthPOST /query
Content-Type: application/json
{
"query": "What is artificial intelligence?",
"use_context": true,
"n_results": 3
}POST /search
Content-Type: application/json
{
"query": "MCP protocol",
"n_results": 5
}POST /documents
Content-Type: application/json
{
"documents": [
"Document text 1",
"Document text 2"
],
"ids": ["doc1", "doc2"],
"metadata": [
{"source": "file1.txt"},
{"source": "file2.txt"}
]
}GET /statsfrom rag_agent import RAGAgent
# Initialize agent
agent = RAGAgent(
ollama_url="http://localhost:11434",
model="mistral"
)
# Get a response
result = agent.get_response("What is RAG?")
print(result["response"])
print(f"Retrieved {len(result['retrieved_documents'])} documents")Edit mcp_config.yaml to customize:
- LLM Settings: Model, temperature, max tokens
- Vector Store: Embedding model, collection name
- RAG: Number of retrieved documents, similarity metric
- Server: Host, port, log level
- Security: API rate limits, authentication
from tools.chromadb_tool import ChromaTool
tool = ChromaTool()
documents = [
"Your document text 1",
"Your document text 2"
]
tool.add_documents(documents, ids=["id1", "id2"])curl -X POST http://localhost:8000/documents \
-H "Content-Type: application/json" \
-d '{
"documents": ["Document 1", "Document 2"],
"ids": ["doc1", "doc2"]
}'Create streamlit_app.py:
import streamlit as st
import requests
st.set_page_config(page_title="RAG Agent", layout="wide")
st.title("MCP-Powered Agentic RAG")
query = st.text_input("Ask a question:")
if query:
response = requests.post(
"http://localhost:8000/query",
json={"query": query}
)
result = response.json()
st.subheader("Response")
st.write(result["response"])
st.subheader("Retrieved Context")
for i, doc in enumerate(result["retrieved_documents"], 1):
st.write(f"**Doc {i}**: {doc[:200]}...")Run Streamlit:
streamlit run streamlit_app.py- ✅ Basic RAG with ChromaDB
- ⬜ Web search tool integration
- ⬜ PDF document ingestion UI
- ⬜ Agent memory (conversation history)
- ⬜ Multi-modal support (images, tables)
- ⬜ Fine-tuning on domain-specific data
- ⬜ Structured output (JSON schemas)
- ⬜ Real-time streaming responses
- Make sure Ollama server is running:
ollama serve - Check it's accessible:
curl http://localhost:11434/api/tags
- Ensure sentence-transformers is installed:
pip install sentence-transformers - First run downloads embeddings model (~30MB)
- Check
./vector_store/directory exists and is writable - Verify
persist_dirin configuration matches actual path
MIT License - See LICENSE file for details
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Commit changes
- Push and open a pull request
