Streaming - Automatic Language Detection (Speech-to-Text) #1039
-
Hello, I want to implement a Speech-To-Text feature to fill out a search bar (Like in the ChatGPT App). Currently we are using Diagram with Streaming and for English it works fine. Can we enable automatic language detection? Reading here it says it's only available for pre-recorded audio? Isn't that to slow for an app like mine? Looking forward to hear from you |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
Hi @peterkrueck, you're correct that our language detection is only supported for pre-recorded audio. One option is to record the user's speech, and then make a pre-recorded API request with language detection. Another option is to require the user to set their language, so you know from the start which language to use in transcribing their streaming audio. It sounds like your audio inputs will be brief. Deepgram can transcribe audio in far less than real time. Say your user speaks for 10 seconds, and you send that as a pre-recorded request - you'll receive a transcription back in a couple seconds or less. If that's unacceptably long for your application, you'll need to manage the language detection in some other way so that you can make use of streaming transcription. |
Beta Was this translation helpful? Give feedback.
Hi @peterkrueck, you're correct that our language detection is only supported for pre-recorded audio.
One option is to record the user's speech, and then make a pre-recorded API request with language detection.
Another option is to require the user to set their language, so you know from the start which language to use in transcribing their streaming audio.
It sounds like your audio inputs will be brief. Deepgram can transcribe audio in far less than real time. Say your user speaks for 10 seconds, and you send that as a pre-recorded request - you'll receive a transcription back in a couple seconds or less. If that's unacceptably long for your application, you'll need to manage the languag…