HuggingFace Repo -- Guidance for how to transform tokens to audio tensors when streaming

I have switched over to using the Hugging Face compatible model. Thank you guys for that. Excellent work!

I am trying to stream the generation to improve latency in a demo I have written using the Optional["BaseStreamer"] interface. Integration with that has gone smoothly but it streams the tokens, not the audio tensors, so when I naively tried to play the streamed data it was of course not working.

Is there a recommended way to process these tokens or some documentation somewhere I may have missed?

I have seen the csm-streaming repo, but not sure I can translate that code (which uses this python repo) to work with the new hugging face/transformers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HuggingFace Repo -- Guidance for how to transform tokens to audio tensors when streaming #163

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HuggingFace Repo -- Guidance for how to transform tokens to audio tensors when streaming #163

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions