Skip to content

Conversation

@rjames-0
Copy link

@rjames-0 rjames-0 commented Oct 31, 2025

This PR implements an optimization to prevent an increasing overhead when running faster-whisper with large batches and token suppression enabled. Fixes issue mentioned in #1566

The suppress token list ordering and deduplication is no longer done on every add call (which scaled badly when batching) but instead once just at apply before launching the cuda kernel.

results from a local machine test with and without optimization:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant