-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diarization too slow #274
Comments
I'm having the same issue. From what i'm reading, the pyannote/speaker-diarization model is slow, but word-level segmentation may be slowing it down even more. I assume there are factors that impact this more than others (i think number of speakers or number of segments influences this the most, but that's just a guess). Looking at hardware usage during runtime, looks like it's batching either one segment at a time or one word at a time (this would make sense, since we're chasing word-level timestamps with whisperx. The pyannote model reports a 2.5% realtime factor, which is definitely NOT been my experience, but may be the case if you ran the raw audio through without segmentation). Maybe there's a way to count individual calls to the GPU to verify. I haven't found a workaround yet, let me know if you find something out. |
I have the same issue. |
That's very strange, it should not be that long, I would expect 5-10mins max. I suspect some bug here.
I would assume most of the time is the clustering step, which can be recursive and can take long if its not finding satisfactory cluster sizes.
Nah the ASR and word-level segmentation is ran independently of the diarization. The diarization is just running a standard pyannote pipeline. So word-level segmentation / whisperx batching shouldnt effect this |
@m-bain I'm also having extremely slow diarization. Using CLI. Just now, to explore further, I also tried setting the For reference, in case it helps:
|
There is an issue regarding pyannote not using GPU, but it should not occur with whisperx. To read more on this, see pyannote/pyannote-audio#1354. |
I am also having an extremely long, ie overnight, diarization on the command line. The transcription occurs, I get two failures in the align segment and then diarization occurs, and I get the following errors: Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.2. To apply the upgrade to your files permanently, run and then I left it running overnight and still in the same state. |
Please try my suggestion in #399 and see if it helps you too. |
@davidas1 There is speed improvement when changing to whisper loaded audio from the raw audio file as you suggested. Thanks for that. How to change the embedding model in code? |
Changing the pyannote pipeline is a bit more involved - I'm using an offline pipeline like described in https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/applying_a_pipeline.ipynb |
what??? thats crazy! here is my timings for 30 minute long mp3: could you please suggest something like a checklist for speeding things up? i also updated to get your recet patch and it did speed up my diarization exponentially |
I wrote that diarization takes 30sec, not the entire pipeline - before the change the diarization took almost 2 minutes. |
oooh i see that clears things. i got 4090 tho |
I'm looking for some help or insight into why diarization is so slow for me. I have a recording that is 1 minute and 14 seconds with two native English speakers and diarization takes 11 minutes and 49 seconds (transcription took 6 seconds). I'm running on a Mac mini with an M2 chip and 8GB of RAM. I assume in this case it's running on CPU although I'm not sure with the Apple silicon. I'm basically using the default example on the README for transcribing and diarizing a file. With a longer file (27 minutes and 39 seconds), with multiple speakers, it takes 2 minutes and 47 seconds to transcribe, 1 minute and 6 seconds to align but 12 hours, 48 minutes to diarize! |
Same here. I'm getting 2-3% GPU utilization 0.9 GB of GPU memory? |
same issue. Almost no GPU utilization and 1.5 hour of diarization per 60 minutes audio. |
same here |
I also noticed that there seems to be some throttling affecting the GPU utilization on Windows 11. As soon as the terminal window is in the background, the GPU utilization drops dramatically |
@m-bain Diarization is a key aspect where multiple speakers are having a conversation. I've been exploring different ways to speed up transcription & diarization pipeline. Can see lots of different options for speeding up transcription like : CTranslate2, Batching, Flash Attention, Distil-Whisper, ComputeTime (float32,16) but finding very limited options for diarization speedup. for a 20 minutes audio, with optimizations we are able to get transcriptions in around 35 seconds. Could you please share if there is any direction which we can follow to speedup diarization process? |
1 hour 30 minutes of audio were processing for over 1 hour in the
diarization...
stage. I'm using an RTX 3090.I'm guessing
--batch_size
doesn't affect pyannote. A setting for pyannote's batch size would be very nice to have.The text was updated successfully, but these errors were encountered: