Introducing Custom Speaker Identification and Labeling #534
Replies: 13 comments 2 replies
-
Thanks for asking your question about Deepgram! If you didn't already include it in your post, please be sure to add as much detail as possible so we can assist you efficiently, such as:
|
Beta Was this translation helpful? Give feedback.
-
Hi @KentonParton, feedback received and I've logged it internally. We've heard interest in this type of feature. For transparency, we're putting more of our focus on cutting-edge ASR models, as well as audio intelligence features (such as summarization, sentiment analysis, intent recognition, topic detection), and text-to-speech. I hope that you'll continue to find value in Deepgram and enjoy our new features over time! |
Beta Was this translation helpful? Give feedback.
-
@jkroll-deepgram Just wanted to +1 that outputting speaker embeddings / "fingerprints" and exposing an API for historical speaker comparison / identification would be highly beneficial in terms of more effective diarization across recordings |
Beta Was this translation helpful? Give feedback.
-
This would be a great feature to have! I would use it. |
Beta Was this translation helpful? Give feedback.
-
Definitely a +1. Thanks @KentonParton for logging this request. I presume there is some sort of quantifiable/probabilistic voice-signature that the transcription engine is calculating besides the spoken content. Within a single transcription, given that DeepGram is able to identify and distinguish portions of the spoken content that maps to the same voice-signature and are marking them appropriately. And I would expect voice-signature to be something like an embedding vector with the ability to measure the distance between two such signatures and deduce that they are close enough to be marked as same speaker. As DeepGram deals with high scale, I wouldn't expect them to store names of people provided on a per account basis and embed those names into transcription response JSON. It might also have privacy implications too, that DeepGram might prefer to not get into. Instead, if only DeepGram can just stash the speaker voice-signature-embedding and its confidence-rate into the response JSON, we on the clientside can convert that into appropriate names. Just sharing a few thoughts - let's discuss. @team-deepgram, @jkroll-deepgram - please please pretty please consider putting some resources onto this request. |
Beta Was this translation helpful? Give feedback.
-
+1, it is actually so well needed! Thanks |
Beta Was this translation helpful? Give feedback.
-
Hi all, (@venusbhatia @avibathula @pprobst @enrique-gm @bmaluijb @everuribe) I've implemented a speaker finger-printing API for a side-project of mine. This is the flow:
I generally create multiple fingerprints for one speaker from different videos to get the most reliable results. Is this what everyone is after? If so, I can make it available on my website for purchase. If your use-cases differs, let me know. |
Beta Was this translation helpful? Give feedback.
-
+1. Also, the current live diarization is unusable. |
Beta Was this translation helpful? Give feedback.
-
Hello we need this |
Beta Was this translation helpful? Give feedback.
-
Hi All,
Deepgrams speech-to-text service with diarization has been invaluable with its remarkable performance and accuracy. However, one feature that would greatly enhance its usefulness for us is the ability to recognize and label specific speakers.
Currently, your system assigns a generic "speaker" label followed by a number. This requires us to manually listen to the recordings, identify each speaker, and then update the speech-to-text output with the names of the speakers.
It would be highly beneficial if we could create a database of speaker profiles (speaker fingerprints). This way, when your system encounters a voice matching one of these profiles, it could automatically apply the custom label (name of speaker).
If Deepgram is open to exploring this enhancement, I'd love to collaborate.
Beta Was this translation helpful? Give feedback.
All reactions