Skip to content

Latest commit

 

History

History
75 lines (46 loc) · 1.59 KB

File metadata and controls

75 lines (46 loc) · 1.59 KB

📚 LibreRAG — An accessible & local RAG Pipeline

This project is a complete Retrieval-Augmented Generation (RAG) pipeline that turns a domain description into a searchable research assistant powered by arXiv papers.

You describe a domain (for example: "An expert on machine learning especially on memory") and the system will:

  1. Generate ranked search keywords using OpenAI
  2. Search arXiv using those keywords
  3. Download and parse research PDFs
  4. Embed the papers locally using all-MiniLM-L6-v2
  5. Store them in Chroma vector database
  6. Let you ask questions over the papers using ChatGPT

The result is a domain-specific research assistant that answers questions using real academic sources.


🧠 How It Works

Domain description → Keyword generation → arXiv search → PDF download → Parsing → Embeddings → Chroma → RAG QA


🚀 Features

  • Domain → ranked search keywords
  • Automatic arXiv ingestion
  • PDF parsing
  • Local embeddings (no cloud vector DB)
  • Chroma vector store
  • RAG-based question answering

📦 Requirements

  • Python 3.9+
  • OpenAI API key
  • Internet connection

🔧 Installation

pip install openai chromadb sentence-transformers pypdf requests

Set your API key:

export OPENAI_API_KEY="your-api-key"

▶️ Running

python librerag.py

⭐ Why This Exists

This project is for researchers, engineers, and builders who want their own domain-specific AI assistant grounded in real academic literature — not hallucinations.


📜 License

MIT