Timestamps drift with MP3 encoding but not with AAC. Why? #987

samueldjack · 2025-01-09T10:45:56Z

I'm puzzling over an issue where the accuracy of timestamps seems to vary depending on the encoding of the input file,

We have MP3 files encoded at 48kpbs. When using WhisperX 3.3.0 to transcribe these (aligning them with whisperx.align method through the Python API) I'm seeing segments that are reasonably well aligned at the beginning, but drifting so that they are a second or so out by the end of the 25 minute audio.

However, if I take that same MP3 file and reencode it to AAC format at 63kbs, the segments are aligned perfectly all the way through the file.

Can anybody provide any insight why this might be happening? My understanding it that WhisperX uses FFMPEG to resample the audio files, so surely the original encoding shouldn't make any difference?

I have attached a pair of sample files, the M4a reencoded from the MP3 file if anybody is able to look at this.

DriftingTimestampsIssue_Trimmed.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timestamps drift with MP3 encoding but not with AAC. Why? #987

Timestamps drift with MP3 encoding but not with AAC. Why? #987

samueldjack commented Jan 9, 2025

Timestamps drift with MP3 encoding but not with AAC. Why? #987

Timestamps drift with MP3 encoding but not with AAC. Why? #987

Comments

samueldjack commented Jan 9, 2025