Skip to content

Media processing toolkit for presentation localization using Google Gemini AI

License

Notifications You must be signed in to change notification settings

yanndebray/montaigne

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

132 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Montaigne

PyPI Tests License

Media processing toolkit for presentation localization using Google Gemini AI.

Features

  • PDF Extraction: Convert PDF pages to images
  • Script Generation: Generate voiceover scripts from slides using AI
  • Image Translation: Translate text in images to any language
  • Audio Generation: Generate voiceover audio from scripts using TTS
  • PowerPoint Generation: Create PPTX from PDF or images with speaker notes
  • Video Generation: Combine slides and audio into MP4 videos
  • Video/Audio Annotation: Frame-accurate annotation tool with waveform visualization
  • Web Editor: Streamlit-based slide editor for managing presentations

Installation

Using pip

pip install montaigne

With optional dependencies

# Install with web editor support
pip install "montaigne[edit]"

# Install with annotation tool support
pip install "montaigne[annotate]"

# Install all optional dependencies
pip install "montaigne[all]"

Using uv

uv pip install montaigne

Using uvx (no installation required)

uvx --from montaigne essai setup
uvx --from montaigne essai script --input presentation.pdf

Setup

  1. Get a Gemini API key from Google AI Studio
  2. Create a .env file:
    GEMINI_API_KEY=your-api-key
    
  3. Verify setup:
    essai setup

Usage

Extract PDF to Images

essai pdf presentation.pdf
essai pdf presentation.pdf --dpi 200 --format jpg

Generate Voiceover Script from Slides

essai script --input presentation.pdf
essai script --input slides_images/ --context "AI workshop"
essai script --input presentation.pdf --output custom_script.md
essai script --input presentation.pdf --model gemini-2.5-flash

Options:

  • --input, -i: PDF file or folder of slide images
  • --output, -o: Output markdown file path
  • --context, -c: Additional context to guide script generation
  • --model, -m: Gemini model to use (default: gemini-3-pro-preview)

Generate Audio from Script

essai audio --script voiceover.md
essai audio --script voiceover.md --voice Kore
essai audio --script voiceover.md --model gemini-2.5-flash-preview-tts

TTS Providers:

Provider Description Installation
gemini Google Gemini TTS API (default) Included
elevenlabs ElevenLabs TTS API Included
coqui Local Coqui XTTS-v2 (no API key) pip install "montaigne[coqui]"

Gemini voices: Puck, Charon, Kore, Fenrir, Aoede, Orus

Local TTS with Coqui:

# Install Coqui dependencies
pip install "montaigne[coqui]"

# Generate audio locally (no API key required)
essai audio --script voiceover.md --provider coqui
essai audio --script voiceover.md --provider coqui --voice male
essai audio --list-voices --provider coqui

Coqui voices: female, male, neutral

Note: First run downloads the XTTS-v2 model (~1.5GB). Requires accepting the CPML license.

Options:

  • --script, -s: Path to voiceover markdown script
  • --provider, -p: TTS provider (gemini, elevenlabs, coqui)
  • --voice, -v: TTS voice to use (default: Orus for Gemini, female for Coqui)
  • --model, -m: Gemini TTS model (default: gemini-2.5-pro-preview-tts)

Translate Images

essai translate --input slides/
essai translate --input image.png --lang Spanish
essai translate --input slides/ --model gemini-2.0-flash-exp

Options:

  • --input, -i: Image file or folder of images
  • --lang, -l: Target language (default: French)
  • --model, -m: Gemini model (default: gemini-3-pro-image-preview)

Create PowerPoint from PDF or Images

essai ppt --input presentation.pdf
essai ppt --input slides/ --script voiceover.md
essai ppt --input presentation.pdf --keep-images

This will create a .pptx file with each PDF page or image as a slide. If a voiceover script is provided, it will be added as speaker notes.

Generate Video from Slides

essai video --pdf presentation.pdf
essai video --images slides/ --audio audio/

Full Localization Pipeline

essai localize --pdf presentation.pdf --script voiceover.md --lang French

This will:

  1. Extract PDF pages to images
  2. Translate all images to the target language
  3. Generate audio for all slides

Video/Audio Annotation Tool

Launch an interactive web UI for annotating videos or audio files with frame-accurate timestamps:

# Install annotation dependencies first
pip install "montaigne[annotate]"

# Launch annotation UI
essai annotate video.mp4
essai annotate audio.wav
essai annotate                        # Auto-detect media in current dir
essai annotate video.mp4 --network    # Make accessible on local network

# Export annotations
essai annotate video.mp4 --export srt   # Export to SRT (Premiere, DaVinci)
essai annotate video.mp4 --export vtt   # Export to WebVTT (browsers)
essai annotate video.mp4 --export json  # Export to JSON

Keyboard shortcuts:

Key Action
Space Play/Pause
I Set In point for range
O Set Out point for range
[ ] Step frame backward/forward
Ctrl+Enter Submit annotation
Escape Clear range / exit input

Features:

  • Frame-accurate timing using requestVideoFrameCallback API
  • Waveform visualization with click-to-seek
  • Light/dark theme toggle
  • Local-first SQLite storage (zero-latency)
  • Export to WebVTT, SRT, JSON formats

Web Editor

Launch a Streamlit-based web interface for managing slides and scripts:

# Install editor dependencies first
pip install "montaigne[edit]"

# Launch the editor
essai edit
essai edit --pdf presentation.pdf --script voiceover.md

Model Configuration

Each AI command supports a --model / -m flag to override the default Gemini model:

Command Default Model Purpose
essai script gemini-3-pro-preview Script generation
essai audio gemini-2.5-pro-preview-tts Text-to-speech
essai translate gemini-3-pro-image-preview Image translation

List available models:

essai models

Voiceover Script Format

Scripts should follow this markdown format:

## SLIDE 1: Title
**[Duration: ~45 seconds]**

Your narration text for slide 1 goes here.

---

## SLIDE 2: Next Topic
**[Duration: ~60 seconds]**

Narration for slide 2.

Demo

See the demo/hamlet/ folder for a complete example with:

  • Sample PDF presentation
  • Voiceover script
  • Image asset
cd demo/hamlet
essai localize --lang French

Requirements

  • Python 3.10+
  • Google Gemini API key
  • ffmpeg (for video generation)
  • Dependencies: google-genai, python-dotenv, pymupdf, python-pptx, Pillow

Optional Dependencies

  • edit: streamlit - Web editor interface
  • annotate: flask - Video/audio annotation tool
  • coqui: TTS, torch, torchaudio - Local TTS with Coqui XTTS-v2 (no API key required)
  • cloud: fastapi, uvicorn, google-cloud-storage - Cloud API deployment

About

Media processing toolkit for presentation localization using Google Gemini AI

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 5