Skip to content

Add F5 TTS pipeline #11958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Add F5 TTS pipeline #11958

wants to merge 6 commits into from

Conversation

ayushtues
Copy link
Contributor

What does this PR do?

Add F5 TTS #10043

@ayushtues ayushtues mentioned this pull request Jul 19, 2025
2 tasks
@ayushtues
Copy link
Contributor Author

Okay, got all the code which is needed in two files, and used existing diffusers primitives in some easy to catch places. Now will work on integrating it in the diffusers class structure

@ayushtues
Copy link
Contributor Author

ayushtues commented Jul 21, 2025

Attention!

Seems like we can use the diffusers Attention class directly, but need to add a new Processor to support RoPE embeds on selective heads as in F5

@ayushtues
Copy link
Contributor Author

ayushtues commented Jul 26, 2025

Tokenization

F5 uses a character level tokenizer for the text, might want to write a simple tokeniser class for it.

Might just be fine to keep it in a simple function for now, since its very straightforward.

@ayushtues
Copy link
Contributor Author

ayushtues commented Jul 29, 2025

Tests

Basic structure looks good now, let's add some tests, and then make it more diffusers friendly! Adding tests would also force me to follow the structure more strongly and ensure that the code is not buggy

@ayushtues
Copy link
Contributor Author

ayushtues commented Jul 29, 2025

Flow matching/Schedulers

Will also need to use one of the schedulers from Diffusers, I think they use simple Euler method only, but the sway sampling step needs to be accounted for somehow, although its just a change in the discretisation schedule so should be straightforward

@ayushtues
Copy link
Contributor Author

Future work

  • Support streaming (already there in OG F5 repo), although this is more like chunk based inference really. Current model is non-causal so only chunk based streaming makes sense anyway
  • Triton server inference, again already there in the F5 repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant