Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing streaming costs over extended periods of time #20

Open
sholzmayer opened this issue Mar 26, 2023 · 7 comments
Open

Reducing streaming costs over extended periods of time #20

sholzmayer opened this issue Mar 26, 2023 · 7 comments

Comments

@sholzmayer
Copy link

Currently, the streaming feature works perfectly fine, but it sends the entire audio stream from the beginning based on the timeSlice seconds. This results in exponential costs with longer periods. For instance, recording around 15 minutes can cost up to $10 with a timeSlice of 1 second.

To avoid such high costs, I suggest implementing a new feature that would resend only the last n-seconds of the audio stream. This would provide some context while reducing the amount of seconds being sent and thus, lowering the costs.

I believe that this improvement would not only make the streaming feature more cost-effective but also enhance its overall performance.

In the attached screenshot you can see the API usage from a 15 minutes streaming transcription:

Screenshot 2023-03-26 at 19 49 47

@MatthewMariner
Copy link

I also have the same issue - basically can't use this library (even though I love it!) until this is fixed as the rendering costs make it unaffordable.

@grtzsohalf
Copy link

I will appreciate a lot if this issue is solved!

@Aslanf8
Copy link

Aslanf8 commented Apr 4, 2023

I agree! This would be a fantastic addition. The cost exponentially grows as time continues.

@kyb3r
Copy link

kyb3r commented May 5, 2023

How about this. Use the browser speech transcription API's to provide immediate feedback for transcription. But then after the the recording is done. Send the final audio file to openai whisper api. And once you have the better transcript, transition from the browser ASR result to the final result in the UI

@shawhu
Copy link

shawhu commented Jun 6, 2023

i was developing a very similar feature and i hit a wall. i was trying to send delta data (each chunk) to the whisper api and it didn't work. and i tried to play it locally, also didn't work. that's how i found this repo.
basically it seems that when you do this

voiceRecorder.ondataavailable = ({ data }) => {
   //trying to use this data
}

only the first blob can be played. all the rest of them are missing some metadata or something. the only way to make those playable is to add them together. like if you have 3 of them, you will have to add data0+data1+data2 together to create a new blob and create a file from that blob, just like the author did in the code.

@taha-yva
Copy link

taha-yva commented May 9, 2024

Anyone who figured out solution of this problem.

@m-fraczek
Copy link

@shawhu I'm stuck with the same exact issue, have you found a way to send the blobs separately?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants