📚 LibreRAG — An accessible & local RAG Pipeline

This project is a complete Retrieval-Augmented Generation (RAG) pipeline that turns a domain description into a searchable research assistant powered by arXiv papers.

You describe a domain (for example: "An expert on machine learning especially on memory") and the system will:

Generate ranked search keywords using OpenAI
Search arXiv using those keywords
Download and parse research PDFs
Embed the papers locally using all-MiniLM-L6-v2
Store them in Chroma vector database
Let you ask questions over the papers using ChatGPT

The result is a domain-specific research assistant that answers questions using real academic sources.

🧠 How It Works

Domain description → Keyword generation → arXiv search → PDF download → Parsing → Embeddings → Chroma → RAG QA

🚀 Features

Domain → ranked search keywords
Automatic arXiv ingestion
PDF parsing
Local embeddings (no cloud vector DB)
Chroma vector store
RAG-based question answering

📦 Requirements

Python 3.9+
OpenAI API key
Internet connection

🔧 Installation

pip install openai chromadb sentence-transformers pypdf requests

Set your API key:

export OPENAI_API_KEY="your-api-key"

▶️ Running

python librerag.py

⭐ Why This Exists

This project is for researchers, engineers, and builders who want their own domain-specific AI assistant grounded in real academic literature — not hallucinations.

📜 License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📚 LibreRAG — An accessible & local RAG Pipeline

🧠 How It Works

🚀 Features

📦 Requirements

🔧 Installation

▶️ Running

⭐ Why This Exists

📜 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

📚 LibreRAG — An accessible & local RAG Pipeline

🧠 How It Works

🚀 Features

📦 Requirements

🔧 Installation

▶️ Running

⭐ Why This Exists

📜 License