A Retrieval-Augmented Generation (RAG) chatbot powered by Mistral-7B-Instruct, designed to deliver thoughtful, down-to-earth responses inspired by Daoist philosophy (Laozi, Zhuangzi). Built using FastAPI and optimized for lightweight, locally-hosted usage.
- RAG Architecture: Retrieves relevant passages from a corpus of Daoist notes using semantic search with FAISS.
- Philosophy-Grounded Prompting: Uses a system message that simulates a grounded Daoist mentor — gentle, honest, and reflective.
- Quantized Model Inference: Runs Mistral-7B in 4-bit using bitsandbytes for faster generation on consumer GPUs.
- Dynamic Knowledge Base: Loads and chunks
.mdnotes from Daoist texts into embeddings at startup. - API Endpoint: Exposes a clean
POST /chatroute for frontend integration (e.g. Unity or web).
| Component | Technology |
|---|---|
| Language Model | mistralai/Mistral-7B-Instruct-v0.2 |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
| Vector Search | FAISS |
| API Framework | FastAPI |
| Quantization | bitsandbytes (4-bit NF4) |
| Tokenizer & Model | transformers |
git clone https://github.com/Awakuruf/rag-chatbot.git
cd rag-bot1.Create a virtual environment:
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows- Install dependencies:
pip install -r requirements.txt- Run the FastAPI server:
cd .\app\
uvicorn main:app --reload- The Unity game will POST messages to http://127.0.0.1:8000/chat.
rag-chatbot/
│
├── app/
│ ├── main.py # FastAPI app with /chat endpoint
│ ├── rag_pipeline.py # Core RAG logic
│ ├── ingest.py # Loads + embeds PDF/Markdown/web docs
│
├── data/
│ ├── daodejing.pdf
│ ├── daodejing_notes.md
| ├── zhuangzi.pdf
│ └── zhuangzi_notes.md
│
├── requirements.txt
├── example_responses.txt
└── README.md
-
Long response times from AI? Consider:
- Reducing max_new_tokens
- Using smaller models like mistral-7b-instruct in 4-bit mode
- Chunking your documents more efficiently
MIT License. Feel free to remix or adapt for educational and non-commercial purposes.