Media processing toolkit for presentation localization using Google Gemini AI.
- PDF Extraction: Convert PDF pages to images
- Script Generation: Generate voiceover scripts from slides using AI
- Image Translation: Translate text in images to any language
- Audio Generation: Generate voiceover audio from scripts using TTS
- PowerPoint Generation: Create PPTX from PDF or images with speaker notes
- Video Generation: Combine slides and audio into MP4 videos
- Video/Audio Annotation: Frame-accurate annotation tool with waveform visualization
- Web Editor: Streamlit-based slide editor for managing presentations
pip install montaigne# Install with web editor support
pip install "montaigne[edit]"
# Install with annotation tool support
pip install "montaigne[annotate]"
# Install all optional dependencies
pip install "montaigne[all]"uv pip install montaigneuvx --from montaigne essai setup
uvx --from montaigne essai script --input presentation.pdf- Get a Gemini API key from Google AI Studio
- Create a
.envfile:GEMINI_API_KEY=your-api-key - Verify setup:
essai setup
essai pdf presentation.pdf
essai pdf presentation.pdf --dpi 200 --format jpgessai script --input presentation.pdf
essai script --input slides_images/ --context "AI workshop"
essai script --input presentation.pdf --output custom_script.md
essai script --input presentation.pdf --model gemini-2.5-flashOptions:
--input, -i: PDF file or folder of slide images--output, -o: Output markdown file path--context, -c: Additional context to guide script generation--model, -m: Gemini model to use (default:gemini-3-pro-preview)
essai audio --script voiceover.md
essai audio --script voiceover.md --voice Kore
essai audio --script voiceover.md --model gemini-2.5-flash-preview-ttsTTS Providers:
| Provider | Description | Installation |
|---|---|---|
gemini |
Google Gemini TTS API (default) | Included |
elevenlabs |
ElevenLabs TTS API | Included |
coqui |
Local Coqui XTTS-v2 (no API key) | pip install "montaigne[coqui]" |
Gemini voices: Puck, Charon, Kore, Fenrir, Aoede, Orus
Local TTS with Coqui:
# Install Coqui dependencies
pip install "montaigne[coqui]"
# Generate audio locally (no API key required)
essai audio --script voiceover.md --provider coqui
essai audio --script voiceover.md --provider coqui --voice male
essai audio --list-voices --provider coquiCoqui voices: female, male, neutral
Note: First run downloads the XTTS-v2 model (~1.5GB). Requires accepting the CPML license.
Options:
--script, -s: Path to voiceover markdown script--provider, -p: TTS provider (gemini,elevenlabs,coqui)--voice, -v: TTS voice to use (default:Orusfor Gemini,femalefor Coqui)--model, -m: Gemini TTS model (default:gemini-2.5-pro-preview-tts)
essai translate --input slides/
essai translate --input image.png --lang Spanish
essai translate --input slides/ --model gemini-2.0-flash-expOptions:
--input, -i: Image file or folder of images--lang, -l: Target language (default:French)--model, -m: Gemini model (default:gemini-3-pro-image-preview)
essai ppt --input presentation.pdf
essai ppt --input slides/ --script voiceover.md
essai ppt --input presentation.pdf --keep-imagesThis will create a .pptx file with each PDF page or image as a slide. If a voiceover script is provided, it will be added as speaker notes.
essai video --pdf presentation.pdf
essai video --images slides/ --audio audio/essai localize --pdf presentation.pdf --script voiceover.md --lang FrenchThis will:
- Extract PDF pages to images
- Translate all images to the target language
- Generate audio for all slides
Launch an interactive web UI for annotating videos or audio files with frame-accurate timestamps:
# Install annotation dependencies first
pip install "montaigne[annotate]"
# Launch annotation UI
essai annotate video.mp4
essai annotate audio.wav
essai annotate # Auto-detect media in current dir
essai annotate video.mp4 --network # Make accessible on local network
# Export annotations
essai annotate video.mp4 --export srt # Export to SRT (Premiere, DaVinci)
essai annotate video.mp4 --export vtt # Export to WebVTT (browsers)
essai annotate video.mp4 --export json # Export to JSONKeyboard shortcuts:
| Key | Action |
|---|---|
| Space | Play/Pause |
| I | Set In point for range |
| O | Set Out point for range |
| [ ] | Step frame backward/forward |
| Ctrl+Enter | Submit annotation |
| Escape | Clear range / exit input |
Features:
- Frame-accurate timing using
requestVideoFrameCallbackAPI - Waveform visualization with click-to-seek
- Light/dark theme toggle
- Local-first SQLite storage (zero-latency)
- Export to WebVTT, SRT, JSON formats
Launch a Streamlit-based web interface for managing slides and scripts:
# Install editor dependencies first
pip install "montaigne[edit]"
# Launch the editor
essai edit
essai edit --pdf presentation.pdf --script voiceover.mdEach AI command supports a --model / -m flag to override the default Gemini model:
| Command | Default Model | Purpose |
|---|---|---|
essai script |
gemini-3-pro-preview |
Script generation |
essai audio |
gemini-2.5-pro-preview-tts |
Text-to-speech |
essai translate |
gemini-3-pro-image-preview |
Image translation |
List available models:
essai modelsScripts should follow this markdown format:
## SLIDE 1: Title
**[Duration: ~45 seconds]**
Your narration text for slide 1 goes here.
---
## SLIDE 2: Next Topic
**[Duration: ~60 seconds]**
Narration for slide 2.See the demo/hamlet/ folder for a complete example with:
- Sample PDF presentation
- Voiceover script
- Image asset
cd demo/hamlet
essai localize --lang French- Python 3.10+
- Google Gemini API key
- ffmpeg (for video generation)
- Dependencies:
google-genai,python-dotenv,pymupdf,python-pptx,Pillow
- edit:
streamlit- Web editor interface - annotate:
flask- Video/audio annotation tool - coqui:
TTS,torch,torchaudio- Local TTS with Coqui XTTS-v2 (no API key required) - cloud:
fastapi,uvicorn,google-cloud-storage- Cloud API deployment