-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Add F5 TTS pipeline #11958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add F5 TTS pipeline #11958
Conversation
Okay, got all the code which is needed in two files, and used existing diffusers primitives in some easy to catch places. Now will work on integrating it in the diffusers class structure |
Attention!Seems like we can use the diffusers Attention class directly, but need to add a new Processor to support RoPE embeds on selective heads as in F5 |
TokenizationF5 uses a character level tokenizer for the text, might want to write a simple tokeniser class for it. Might just be fine to keep it in a simple function for now, since its very straightforward. |
TestsBasic structure looks good now, let's add some tests, and then make it more diffusers friendly! Adding tests would also force me to follow the structure more strongly and ensure that the code is not buggy |
Flow matching/SchedulersWill also need to use one of the schedulers from Diffusers, I think they use simple Euler method only, but the sway sampling step needs to be accounted for somehow, although its just a change in the discretisation schedule so should be straightforward |
Future work
|
What does this PR do?
Add F5 TTS #10043