An intelligent local research assistant built with Gradio, LangChain, Qdrant, and Ollama. This app lets you download arXiv papers, extract and embed their contents, and ask deep research questions—all processed locally using lightweight language models.
The process flow of this system is as follows:
- Search Research Articles on arXiv
- Search arXiv for relevant papers based on the user’s query.
- Convert to Embeddings (LangChain)
- Convert the content of the papers to embeddings using the LangChain library.
- Save Embeddings to Qdrant
- Store the generated embeddings in a Qdrant vector store for efficient retrieval.
- Ask a Question (Ollama)
- Ask a question based on the stored content, and get answers using the Ollama model.
Here's the visual flow of the system:
- 🔍 arXiv Search: Automatically search and download recent research papers based on a topic.
- 📄 PDF Extraction: Extracts and chunks the content of the downloaded PDFs for semantic analysis.
- 🧠 Vector Store: Embeds paper content and stores it in Qdrant for efficient retrieval.
- 🤖 Local LLM QA: Uses a lightweight Ollama model (
tinyllama) to answer user questions using contextual information. - 📊 Enhanced UI: Detailed stats on processed papers including pages, words, and chunks.
https://github.com/DishantB0411/AI-Assistant.git
cd arxiv-research-assistantpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtHere’s the updated README.md section that includes clear instructions on how to pull the tinyllama model using Ollama, along with the already included ollama run tinyllama step.
You can add this under the 🛠️ Installation section, right after installing requirements:
Make sure Ollama is installed and running on your system.
To download the TinyLLaMA model, run:
ollama pull tinyllamaThen start the model:
ollama run tinyllamaThis app uses
tinyllama, a lightweight model that runs efficiently on CPUs.
python app.pyA minimal Gradio UI will appear allowing you to:
- Enter an arXiv search query
- Ask a question
- View the model's answer
python enhanced_app.pyThis version includes:
- Paper download count control
- Progress tracking
- Extraction and chunking statistics
- Improved prompting for richer responses
.
├── app.py # Basic version
├── enhanced_app.py # Full-featured version with stats
├── requirements.txt
├── screenshot.png # UI screenshot (for README)
├── flow.png # Flow screenshot (for README)
├── tmp/ # Temporary vector store (created at runtime)
└── arxiv_papers/ # Downloaded PDFs (created at runtime)
| Component | Tool |
|---|---|
| Search & PDF | arxiv Python client |
| LLM | Ollama + tinyllama |
| Embeddings | GPT4AllEmbeddings |
| Vector Store | Qdrant (local mode) |
| UI | Gradio |
| NLP Chain | LangChain + Prompts |
Enhanced version uses this template:
You are an expert research assistant. Based only on the following context from research papers, provide a detailed and well-structured answer to the question. Cite relevant insights clearly. You can provide your knowledge which is out of the context when necessary while framing the answer. The final answer should be containing minimum 200 words.
Context:
{context}
Question: {question}
- Optimized for local usage with lightweight LLMs.
- The
arxiv_papers/andtmp/directories are overwritten after each new query, so results are not persisted across sessions. - If you experience connection issues while downloading PDFs, the system will retry automatically.
- Paper content is embedded and stored locally for semantic search.
This project is licensed under the MIT License.
- Support multiple LLMs from Ollama
- Multi-paper summarization
- Memory-enabled chat history

