This web service converts video files (MP4) to SRT subtitle format, with options to translate to Chinese and overlay subtitles onto the original video using the OpenAI Whisper API via AiHubMix.
Switch to Chinese Version | Switch to English Version
- Web-based interface for easy file conversion
- Converts MP4 files to SRT subtitles with accurate timestamps
- Translates subtitles to Chinese or English using configurable AI models
- Overlays subtitles onto the original video
- Download and process videos from X (Twitter)
- Streamable video playback with links for both original and subtitled videos
- Configurable API settings with support for custom providers and models
- Character count adjustment for wide-screen videos
- Progress tracking with one decimal place precision
- Responsive web interface
- Python 3.6+
- FFmpeg
- ImageMagick (for subtitle overlay)
flaskfor web interfacemoviepyfor video/audio processingopenaifor API access
- Clone or download this repository
- Install dependencies:
pip install -r requirements.txt
-
Configure your API settings as environment variables:
export TRANSCRIBE_API_KEY="your_transcription_api_key" export TRANSLATE_API_KEY="your_translation_api_key" # Or use the fallback variable: # export AIHUBMIX_API_KEY="your_api_key_here" -
Run the application:
python app.py -
Open your browser to
http://localhost:5000
The web interface provides two main ways to process videos:
- Upload MP4 or MP3 files directly from your computer
- Choose processing options:
- Translate to Chinese: Convert subtitles to Chinese
- Translate to English: Convert subtitles to English (mutually exclusive with Chinese translation)
- Overlay Subtitles on Video: Add subtitles directly to the video
- After processing, download links are provided for:
- Original SRT file
- Translated SRT file (if translation was selected)
- Subtitled video (if overlay was selected)
- Video preview links for both original and subtitled versions (opened in new tabs)
- Enter the URL of an X (Twitter) post with a video
- The service will download the video and show processing options
- Choose translation (Chinese or English) and/or subtitle overlay options
- After processing, download links and video preview links are provided
- After successful processing, you can preview both original and subtitled videos
- Click "
▶️ Play Original Video in New Tab" to play the original video - Click "
▶️ Play Video with Subtitles in New Tab" to play the subtitled video - Videos open in new browser tabs for convenient playback
-
Configure your API keys in the docker-compose.yml file or create a
.envfile:TRANSCRIBE_API_KEY=your_transcription_api_key TRANSLATE_API_KEY=your_translation_api_key -
Build and run the container:
docker-compose up -d -
Access the service at
http://localhost:5000
-
Run the deployment script:
chmod +x deploy_ubuntu.sh sudo ./deploy_ubuntu.sh -
Set the API keys:
sudo systemctl set-environment TRANSCRIBE_API_KEY='your_transcription_api_key' sudo systemctl set-environment TRANSLATE_API_KEY='your_translation_api_key'Or use the fallback:
sudo systemctl set-environment AIHUBMIX_API_KEY='your_api_key_here' -
Access the service at your server's IP address
The service supports configurable API settings for both transcription and translation services. You can provide configuration in several ways:
Transcription Settings:
TRANSCRIBE_API_KEY- API key for transcription service (Whisper)TRANSCRIBE_BASE_URL- API endpoint URL for transcription service (default: https://aihubmix.com/v1)TRANSCRIBE_MODEL- Model to use for transcription (default: whisper-1)
Translation Settings:
TRANSLATE_API_KEY- API key for translation service (Gemini or other LLM)TRANSLATE_BASE_URL- API endpoint URL for translation service (default: https://aihubmix.com/v1)TRANSLATE_MODEL- Model to use for translation (default: gemini-2.5-flash-lite)
Fallback Settings:
AIHUBMIX_API_KEY- Fallback API key (if specific keys are not set)API_KEY- Final fallback API key
You can provide configuration in several ways:
- Environment variables (highest precedence)
- In systemd service file
- In docker-compose.yml file
- In a secure config file at
config.env(for local development) - In
/etc/video-converter/config.env(system-wide config) - In
~/.video-converter/config.env(user-specific config)
The configuration files should follow the format:
TRANSCRIBE_API_KEY=your_transcription_api_key
TRANSCRIBE_BASE_URL=https://your-transcription-service.com/v1
TRANSCRIBE_MODEL=whisper-1
TRANSLATE_API_KEY=your_translation_api_key
TRANSLATE_BASE_URL=https://your-translation-service.com/v1
TRANSLATE_MODEL=your-model-name
To protect your API keys, follow these security practices:
-
After running the deployment script, create a secure secrets file:
sudo nano /etc/video-converter/secrets
-
Add your API keys to the file:
TRANSCRIBE_API_KEY=your_transcription_api_key TRANSLATE_API_KEY=your_translation_api_key -
Set appropriate permissions:
sudo chmod 640 /etc/video-converter/secrets sudo chown root:video_converter /etc/video-converter/secrets
-
Restart the service:
sudo systemctl restart video-converter
-
Create a
.envfile in the project directory:TRANSCRIBE_API_KEY=your_transcription_api_key TRANSLATE_API_KEY=your_translation_api_key -
Make sure the
.envfile is not committed to version control by adding it to.gitignore:.env config.env
-
Create a
config.envfile in the project directory:TRANSCRIBE_API_KEY=your_transcription_api_key TRANSLATE_API_KEY=your_translation_api_key -
Make sure this file is in your
.gitignoreto prevent committing it.
GET /: Main web interfacePOST /upload: Upload and process video filesGET /download/<filename>: Download processed filesGET /upload_stream/<filename>: Stream video from upload directory (for original videos)GET /output_stream/<filename>: Stream video from output directory (for subtitled videos)GET /subtitles/<filename>: Serve SRT files as WebVTT for video players (if enabled)GET /download_progress/<url_hash>: Get download progress for X video downloadsPOST /download_x_video: Initiate download of video from X (Twitter) URLPOST /process_x_video: Process a downloaded X video with optional translationPOST /telegram_webhook: Telegram bot webhook (for receiving messages from Telegram)
The service includes a Telegram bot integration that allows users to process videos directly through Telegram.
- Create a bot with BotFather on Telegram
- Get your bot token
- Set the bot token as an environment variable:
export TELEGRAM_BOT_TOKEN="your_bot_token_here" - Set your public URL where the bot can receive webhooks:
export PUBLIC_URL="https://your-domain.com" - Set the webhook by calling the set_webhook function in the app
Once configured, users can:
- Send the
/startcommand to get started - Send MP4 or MP3 files to generate subtitles
- Receive both original and Chinese translated SRT files
- Get help with the
/helpcommand
- For wide-screen videos (width > 1280px), the character count per line is doubled (from 16 to 32 characters)
- This improves subtitle readability on wide-screen videos by accommodating more text per line
- Non-wide-screen videos continue to use the standard 16 characters per line
- All processing operations show progress with one decimal place precision (e.g., 45.7%)
- Progress is tracked both for local file uploads and X video downloads
- Progress updates are more granular for better user experience
- Support for both Chinese and English translations
- Translation options are mutually exclusive (select either Chinese or English)
- Customizable line break processing for Chinese text based on video width
- Download videos directly from X (Twitter) posts using the video URL
- Automatic video format detection and download
- Integrated processing options after download
- Progress tracking during download
- Both original and subtitled videos can be streamed directly in browser
- Separate streaming endpoints for upload and output directories
- Video links open in new tabs for convenient playback
- Process a specific MP4 file:
python mp4_to_mp3.py path/to/video.mp4 - Process all files in a directory:
python mp4_to_mp3.py path/to/directory - Process an existing MP3 file:
python mp4_to_mp3.py path/to/audio.mp4 - Translate to Chinese: Add
--translateor-tflag, e.g.,python mp4_to_mp3.py --translate path/to/video.mp4 - Overlay subtitles on video: Add
--overlayor-oflag, e.g.,python mp4_to_mp3.py --overlay path/to/video.mp4 - Combine flags:
python mp4_to_mp3.py --translate --overlay path/to/video.mp4
This project is open source and welcomes contributions from the community.
- Fork the repository and create your branch from
main - Add your features and ensure they work properly
- Update documentation as needed
- Submit a pull request with a clear description of your changes
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this project helpful, consider starring the repository and contributing to its development!