-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing streaming costs over extended periods of time #20
Comments
I also have the same issue - basically can't use this library (even though I love it!) until this is fixed as the rendering costs make it unaffordable. |
I will appreciate a lot if this issue is solved! |
I agree! This would be a fantastic addition. The cost exponentially grows as time continues. |
How about this. Use the browser speech transcription API's to provide immediate feedback for transcription. But then after the the recording is done. Send the final audio file to openai whisper api. And once you have the better transcript, transition from the browser ASR result to the final result in the UI |
i was developing a very similar feature and i hit a wall. i was trying to send delta data (each chunk) to the whisper api and it didn't work. and i tried to play it locally, also didn't work. that's how i found this repo.
only the first blob can be played. all the rest of them are missing some metadata or something. the only way to make those playable is to add them together. like if you have 3 of them, you will have to add data0+data1+data2 together to create a new blob and create a file from that blob, just like the author did in the code. |
Anyone who figured out solution of this problem. |
@shawhu I'm stuck with the same exact issue, have you found a way to send the blobs separately? |
Currently, the streaming feature works perfectly fine, but it sends the entire audio stream from the beginning based on the timeSlice seconds. This results in exponential costs with longer periods. For instance, recording around 15 minutes can cost up to $10 with a timeSlice of 1 second.
To avoid such high costs, I suggest implementing a new feature that would resend only the last n-seconds of the audio stream. This would provide some context while reducing the amount of seconds being sent and thus, lowering the costs.
I believe that this improvement would not only make the streaming feature more cost-effective but also enhance its overall performance.
In the attached screenshot you can see the API usage from a 15 minutes streaming transcription:
The text was updated successfully, but these errors were encountered: