Skip to content

examples : update vad support in stream example #3160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

danbev
Copy link
Collaborator

@danbev danbev commented May 15, 2025

This commit updates the stream example to use the VAD support in whisper instead of the simple_vad that it currently uses.

wip

@danbev danbev force-pushed the stream-example-vad-update branch from 950d7c5 to b924706 Compare May 16, 2025 13:30
@mdestagnol
Copy link
Contributor

mdestagnol commented May 16, 2025

Following up on my comment on another PR. Just a few thoughts as I was reading your great work on VAD.

Currently, VAD is used in the stream example as a way to only send to whisper the samples containing voice. This is an optimization in terms of compute but I wonder if we miss on the benefits of accuracy.

Similarly to how we have a sliding window when not using VAD in this stream example, we could leverage VAD to attempt to chunk the audio during silence rather than in the middle of a word– sort of a dynamic sliding window.

We could also keep a part of the previous sample, similar to what we do when not using VAD. That would allow better accuracy in case there's no good silence where to chunk.

Finally, is there a reason why we don't reinject the previous tokens in the prompt when using VAD? -> nevermind this is already implemented in your PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants