Skip to content

bogenc/LibreRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“š LibreRAG β€” An accessible & local RAG Pipeline

This project is a complete Retrieval-Augmented Generation (RAG) pipeline that turns a domain description into a searchable research assistant powered by arXiv papers.

You describe a domain (for example: "An expert on machine learning especially on memory") and the system will:

  1. Generate ranked search keywords using OpenAI
  2. Search arXiv using those keywords
  3. Download and parse research PDFs
  4. Embed the papers locally using all-MiniLM-L6-v2
  5. Store them in Chroma vector database
  6. Let you ask questions over the papers using ChatGPT

The result is a domain-specific research assistant that answers questions using real academic sources.


🧠 How It Works

Domain description β†’ Keyword generation β†’ arXiv search β†’ PDF download β†’ Parsing β†’ Embeddings β†’ Chroma β†’ RAG QA


πŸš€ Features

  • Domain β†’ ranked search keywords
  • Automatic arXiv ingestion
  • PDF parsing
  • Local embeddings (no cloud vector DB)
  • Chroma vector store
  • RAG-based question answering

πŸ“¦ Requirements

  • Python 3.9+
  • OpenAI API key
  • Internet connection

πŸ”§ Installation

pip install openai chromadb sentence-transformers pypdf requests

Set your API key:

export OPENAI_API_KEY="your-api-key"

▢️ Running

python librerag.py

⭐ Why This Exists

This project is for researchers, engineers, and builders who want their own domain-specific AI assistant grounded in real academic literature β€” not hallucinations.


πŸ“œ License

MIT

About

Local Retrieval-Augmented Generation (RAG) pipeline that turns a domain description into a searchable research assistant powered by arXiv papers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages