Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamps drift with MP3 encoding but not with AAC. Why? #987

Open
samueldjack opened this issue Jan 9, 2025 · 0 comments
Open

Timestamps drift with MP3 encoding but not with AAC. Why? #987

samueldjack opened this issue Jan 9, 2025 · 0 comments

Comments

@samueldjack
Copy link

I'm puzzling over an issue where the accuracy of timestamps seems to vary depending on the encoding of the input file,

We have MP3 files encoded at 48kpbs. When using WhisperX 3.3.0 to transcribe these (aligning them with whisperx.align method through the Python API) I'm seeing segments that are reasonably well aligned at the beginning, but drifting so that they are a second or so out by the end of the 25 minute audio.

However, if I take that same MP3 file and reencode it to AAC format at 63kbs, the segments are aligned perfectly all the way through the file.

Can anybody provide any insight why this might be happening? My understanding it that WhisperX uses FFMPEG to resample the audio files, so surely the original encoding shouldn't make any difference?

I have attached a pair of sample files, the M4a reencoded from the MP3 file if anybody is able to look at this.

DriftingTimestampsIssue_Trimmed.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant