I have switched over to using the Hugging Face compatible model. Thank you guys for that. Excellent work!
I am trying to stream the generation to improve latency in a demo I have written using the Optional["BaseStreamer"] interface. Integration with that has gone smoothly but it streams the tokens, not the audio tensors, so when I naively tried to play the streamed data it was of course not working.
Is there a recommended way to process these tokens or some documentation somewhere I may have missed?
I have seen the csm-streaming repo, but not sure I can translate that code (which uses this python repo) to work with the new hugging face/transformers