GitHub

Research.AI

AI-powered research paper search and audio assistant.

Overview

Research.AI enables users to search for research papers and generate real-life voice audio summaries from paper URLs.
The project is designed to be modular and extensible, supporting multiple search sources and voice synthesis backends.

Voice Synthesis: Current Status

Currently Used:
We use Google Text-to-Speech (gTTS) for generating audio summaries. This is because gTTS is lightweight and works well on laptops or low-resource servers.
Preferred (Not Yet Default):
HuggingFace Kokoro and Dia provide much higher quality, human-like voice synthesis. However, they require significant GPU resources and are not practical to run on most personal laptops or basic servers.
How to Upgrade:
If you have access to a powerful GPU server, you can swap out gTTS for HuggingFace Kokoro/Dia in the backend for much more realistic audio.

How it Works

User Input:
Enter a search term or paste a research paper URL in the search bar.
Paper Search:
- If a keyword is entered, the frontend queries the backend (/query?q=...) to fetch relevant research papers from multiple sources (arXiv, Semantic Scholar, etc.).
- Results are displayed in a list.
Audio Generation:
- For any paper (from search or direct URL), click "Generate Audio".
- The frontend sends a request to the backend (/create_podcast?url=...).
- The backend extracts and summarizes the paper content.
- The backend uses gTTS (or HuggingFace models if enabled) to generate an audio summary.
- The generated audio file is returned to the frontend.
Playback & Download:
- The frontend displays an audio player for the generated summary.
- Users can listen to or download the audio file.

Frontend

Tech Stack: Next.js, React, TailwindCSS, Zustand (state), Radix UI.

Key Components:

SearchBar:
Lets users search for papers by keyword or enter a URL for audio generation. Handles tab switching between search and audio generation modes.
SearchResults:
Displays search results from all sources. Allows users to copy paper URLs or generate audio summaries.
AudioPlayer:
Plays generated audio and provides download functionality.
State Management:
Uses Zustand (/lib/store.ts) to manage search results, loading states, and the current podcast URL.
UI Components:
Modular UI built with Radix UI and TailwindCSS for a modern, responsive experience.

Backend (Design & File Responsibilities)

Note: Backend code is not included in this workspace, but here's how it is structured and how each file/agent works.

Main Endpoints

GET /query?q=...
Searches for research papers using multiple sources and LLMs.
GET /create_podcast?url=...
Generates an audio summary from a paper URL.

Example Backend Flow

Receive a paper URL or search query.
Search:
- search.py calls arxiv.py and semantic_scholar.py to fetch papers.
- Results are merged and returned.
Summarize:
- llm.py or agentic_llm.py summarizes the paper.
- If agentic, the LLM can call tools (e.g., arXiv search) as needed.
Synthesize Audio:
- tts.py generates audio using gTTS (or HuggingFace if enabled).
Return:
- The audio file is sent to the frontend for playback/download.

Getting Started

Prerequisites

Node.js (v18+)
Backend server (Python/Node, with endpoints as above, and access to gTTS or HuggingFace models)

Frontend Setup

Clone this repo.
Set the backend URL in your environment variables (NEXT_PUBLIC_BACKEND_URL).
Install dependencies and run the frontend.

Backend Setup

Implement /query and /create_podcast endpoints.
Integrate arXiv, Semantic Scholar, and LLM summarization.
Use gTTS for TTS by default; switch to HuggingFace Kokoro/Dia if you have a GPU server.
Example libraries: transformers, gtts, flask/fastapi (Python).

Usage

Search for papers or enter a URL.
Click "Generate Audio" to create and play/download the audio summary.

Major Improvements

Modular backend: Easily add new sources or LLMs.
Real-life voice synthesis with HuggingFace Kokoro/Dia (if hardware allows).
Fast local TTS with gTTS for instant feedback.
Modern, user-friendly UI.

File-by-File Backend Explanation

app.py:
Main backend server. Exposes the API endpoints (/query, /create_podcast) for searching papers and generating audio summaries. Handles request routing and integrates all backend modules.
audio.py:
Handles text-to-speech (TTS) synthesis. Uses gTTS by default for fast, local audio generation. Can be extended to use HuggingFace Kokoro/Dia for high-quality, real-life voice if hardware allows.
llm.py:
Contains logic for summarizing research papers using a language model (LLM). Supports different LLM backends (OpenAI, HuggingFace, or local models). Responsible for generating concise summaries from paper content.
tools.py:
Utility functions and helper methods for the backend. May include functions for formatting, error handling, deduplication, or integration with external APIs.
archive/agent.py:
(If used) Implements agentic workflows, where the LLM can autonomously decide to call tools (e.g., search, summarization) as part of its reasoning process.
audio_cache/
Stores generated audio files (WAV format) for quick retrieval and to avoid redundant synthesis.

Let me know if you want backend code samples, more details, or help with deployment!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
archive		archive
audio_cache		audio_cache
sample_output		sample_output
.gitignore		.gitignore
app.py		app.py
audio.py		audio.py
llm.py		llm.py
readme.md		readme.md
requirements.txt		requirements.txt
tools.py		tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Research.AI

Overview

Voice Synthesis: Current Status

How it Works

Frontend

Backend (Design & File Responsibilities)

Main Endpoints

Example Backend Flow

Getting Started

Prerequisites

Frontend Setup

Backend Setup

Usage

Major Improvements

File-by-File Backend Explanation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

MINEGHOST007/research.ai

Folders and files

Latest commit

History

Repository files navigation

Research.AI

Overview

Voice Synthesis: Current Status

How it Works

Frontend

Backend (Design & File Responsibilities)

Main Endpoints

Example Backend Flow

Getting Started

Prerequisites

Frontend Setup

Backend Setup

Usage

Major Improvements

File-by-File Backend Explanation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages