Force Alignment with original text #1009

TechMaster · 2025-01-21T14:01:59Z

I generate a Vietnamese song with Suno. I want to generate a subtitle file that highlights each word spoken in the song.
I use WhisperX to transcribe this song

whisperx song.wav --language vi --model large-v2 --highlight_words True -f json

I have 2 questions:
1- Time stamp of full sentences are correct but time stamp of each word is incorrect. How should I fix this problem? Which parameter should I add in command line to make it work?

2- Some words in transcribed JSON file are incorrect. I have original lyrics of the song. How can I input original lyrics to WhisperX to improve accuracy of the json file?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force Alignment with original text #1009

Force Alignment with original text #1009

TechMaster commented Jan 21, 2025 •

edited

Loading

Force Alignment with original text #1009

Force Alignment with original text #1009

Comments

TechMaster commented Jan 21, 2025 • edited Loading

TechMaster commented Jan 21, 2025 •

edited

Loading