A machine learning workshop demo showcasing progressively sophisticated methods for searching through video content. This project demonstrates the evolution from simple text search to advanced AI-powered video understanding.
This application allows users to:
- Transcribe YouTube videos and uploaded video files using OpenAI Whisper
- Search through video content using multiple search paradigms
- Navigate directly to relevant video segments with click-to-seek functionality
- Use LLMs to synthesize answers from video content
- Description: Simple text matching within transcript segments
- Use Case: Finding exact phrases or specific terms mentioned in the video
- Description: Vector similarity search using multilingual sentence embeddings
- Technology: Uses
paraphrase-multilingual-MiniLM-L12-v2
embeddings stored in ChromaDB - Use Case: Finding conceptually related content even when exact words don't match
- Description: Uses Large Language Models to synthesize coherent answers from semantic search results
- Technology: Supports Ollama (local) or vLLM (production) backends
- Use Case: Getting comprehensive answers that combine information from multiple video segments
- FastAPI: REST API framework
- OpenAI Whisper: Speech-to-text transcription
- ChromaDB: Vector database for embeddings
- Sentence Transformers: Multilingual embeddings
- yt-dlp: YouTube video downloading
- FFmpeg: Audio extraction
- Ollama/vLLM: LLM inference backends
- React + TypeScript: UI framework
- Vite: Build tool
- Tailwind CSS: Styling
- Docker and Docker Compose installed
- NVIDIA Container Toolkit (for GPU support, optional)
- Clone the repository:
git clone <repository-url>
cd video-search
- Copy the example environment file:
cp backend/.env.example backend/.env
- Run the application:
./run.sh
This script will:
- Detect if you have a GPU and use the appropriate profile
- Start all services (backend, frontend, Ollama)
- Pull the required LLM model (qwen3:8b) on first run
- Open the application at http://localhost:5173
To stop all services, press Ctrl+C
.
- Python 3.10+ (3.12 recommended)
- FFmpeg (for audio extraction)
- Ollama (for LLM functionality)
- Node.js 18+ and npm
- Git (for cloning the repository)
Verify all prerequisites are installed:
# Check Python version
python --version # Should show 3.10 or higher
# Check Node.js and npm
node --version # Should show v18 or higher
npm --version
# Check FFmpeg
ffmpeg -version # Should show FFmpeg version info
# Check Git
git --version
# Check Ollama (after installation)
ollama --version
# Using Homebrew
brew install [email protected] node ffmpeg
brew install --cask ollama
# Start Ollama
ollama serve
# Update package list
sudo apt update
# Install Python and pip
sudo apt install python3.12 python3-pip
# Install Node.js (via NodeSource)
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install nodejs
# Install FFmpeg
sudo apt install ffmpeg
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
- Install WSL2 following Microsoft's guide
- Open WSL2 terminal and follow Ubuntu instructions above
git clone <repository-url>
cd video-search
cd backend
# Create virtual environment (recommended)
python -m venv venv
# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the backend server (IMPORTANT: Use this exact command)
python -m app.main
The backend will start on http://localhost:9091
Open a new terminal window:
cd frontend
# Install dependencies
npm install
# Start development server
npm run dev
The frontend will start on http://localhost:5173
The application works with sensible defaults. You only need a .env
file if you want to customize the configuration.
Default Configuration (no .env needed):
- LLM Backend: Ollama on http://localhost:11434
- Default Model: qwen3:8b
- Embedding Model: paraphrase-multilingual-MiniLM-L12-v2
- Database: ./chroma_db
To customize configuration:
cd backend
cp .env.example .env
# Edit .env with your preferred settings
Example .env
for customization:
# Use a different LLM model
LLM_MODEL=llama3.2:3b
# Or use vLLM instead of Ollama
LLM_BACKEND=vllm
VLLM_BASE_URL=http://localhost:8000/v1
Before using LLM synthesis, download a model with Ollama:
# Pull the default model
ollama pull qwen3:8b
# Or choose a smaller model for limited resources
ollama pull llama3.2:3b
POST /transcribe/video-url
: Transcribe a YouTube video from URLPOST /transcribe/video-file
: Transcribe an uploaded video fileGET /transcribe/audio/{filename}
: Get extracted audio file
POST /search/keyword
: Keyword search in transcriptsPOST /search/semantic
: Semantic similarity searchPOST /search/llm
: LLM-synthesized answers from search results
GET /llms/models
: List available LLM modelsPOST /llms/select
: Select active LLM model
- Highlight search keywords in keyword search results
- Fix tokenizer parallelism warnings by setting
TOKENIZERS_PARALLELISM=false
- Implement VLM search functionality
- Add progress indicators for long-running operations
- Fix tests
This demonstrator is designed for ML workshops to showcase:
- The progression from simple to sophisticated search methods
- How different AI technologies can be combined for better results
- Practical implementation of embeddings, vector databases, and LLMs
- The importance of user experience (click-to-seek functionality)
Each search method builds upon the previous ones, demonstrating increasing levels of AI sophistication while maintaining practical usability.