Realtime #170

Bellafc · 2024-03-12T03:53:31Z

Bellafc
Mar 12, 2024

Can you make the process (asr+diarization) real time? Many thanks!🙏

wmantly · 2025-10-14T17:15:46Z

wmantly
Oct 14, 2025

Well over a year old and not even response. I suppose not.

1 reply

MahmoudAshraf97 Oct 14, 2025
Maintainer

it's definitely doable, depending on how many resources and efforts you are willing to pour into, 'realtime' is a very vague word when it comes to speed because it cannot be measured or defined as a metric, hence the lack of response

wmantly · 2025-10-14T17:41:49Z

wmantly
Oct 14, 2025

Reasonable. lets refine "real time" by a the standards of a realistic use case: whisper-diarization is listening in on some multi-person audio call and generating a best effort "who said what" as the audio stream in or chunks roll in. Now, its understandable that the STT wont be instant. The "real time" request would be whisper-diarization take an audio stream, or some set audio chunk size in a loop and spits out data as soon as it can.

It is very reasonable to divert the actually audio chunking, chunk queuing and feeding to be an external issue and not whisper-diarization responsibility. But, the chunks should be treated as an ongoing session.

Does this help?

1 reply

MahmoudAshraf97 Oct 15, 2025
Maintainer

from the perspective of performance constraints, it's doable since everything has rtf > 1x, what needs work is the individual components that are designed to work with the whole audio available such as STT and diarization model (#342 implements a model with streaming support), ASR and diarization are two independent components and each of them can be adapted to streaming, you'll have to modify the code that combines the results of both components to generate a diarized transcript.

Whisper works the best with 20-30s segments, so for streaming it needs special finetuning such as This, or use a model with native streaming support.
As for diarization, it needs to maintain a session state so that the incoming audio is cross-referenced with the old audio, this is mandatory for it to work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Realtime #170

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Realtime #170

Uh oh!

Bellafc Mar 12, 2024

Replies: 2 comments · 2 replies

Uh oh!

wmantly Oct 14, 2025

Uh oh!

MahmoudAshraf97 Oct 14, 2025 Maintainer

Uh oh!

Uh oh!

wmantly Oct 14, 2025

Uh oh!

MahmoudAshraf97 Oct 15, 2025 Maintainer

Bellafc
Mar 12, 2024

Replies: 2 comments 2 replies

wmantly
Oct 14, 2025

MahmoudAshraf97 Oct 14, 2025
Maintainer

wmantly
Oct 14, 2025

MahmoudAshraf97 Oct 15, 2025
Maintainer