Streaming API is very slow, is it a bug or a user error? #1066
Replies: 3 comments 1 reply
-
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
One more thing. I'm doing a load test, and the docs say up to 100 concurrent connections to pre-recorded API. With 50 threads I get 100% success rate. However, once I go above 50 threads, I start getting 429 even though I'm not making more than 100 concurrent requests. How is the rate limit calculated? |
Beta Was this translation helpful? Give feedback.
-
Hello!
We have a product that is using Speech To Text, and we're evaluating Deepgram APIs to potentially use it as a primary STT provider.
We have our own Voice Activity Detection, and we buffer up audio before transcribing. The length of buffered audio is under 10 sec (most I believe is even under 5 sec).
First, we tried pre-recorded audio API since it made more sense to us, and we get fairly good performance. However, this post https://github.com/orgs/deepgram/discussions/751 says
This pre-recorded endpoint is designed to be fast, but it is not expected to be real time or have a maximum latency below 20 seconds. If you need consistently low latency times, please utilize our Streaming speech to text services, which process audio in real time and may be better suited for your use case.
that suggests there are no performance guarantees, and the streaming API would be a better choice.
So, I tried the streaming API by sending chunks of our buffered audio (in 4KB chunks) over, and then finalizing, and closing the web socket connection. The results were disappointing.
While the first response comes back in about 300ms (as promised), the full transcription takes 10X longer than pre-recorded API for the same audio file. Here is one example. You can see that the first response comes back in about 316ms, the second comes back in over 1.6 sec, the time between third and forth responses is ~2.5 sec. The total time between all the data being sent, and the final message received is 5.2sec, that seems too long for a 5 sec audio
Also, the transcription is not accurate. The pre-recorded API returns the correct transcription in about 500ms, and this is one of the longer responses.
Note the first word
pay
vspaint
, anddol
vsdull
.I've ran 100 iterations of each version, and these are the stats:
While WebSocket/Streaming APIs are very consistent in their response times, they are much slower than the streaming APIs.
I'm using Java, and the code is fairly straight forward:
I tried to increase the chunk size from 4KB to 8KB, and I also tried to remove
Finalize
message but neither made any difference.Is it the expected performance, or I'm doing something wrong, and the streaming API is not supposed to be used that way?
Beta Was this translation helpful? Give feedback.
All reactions