Introducing Custom Speaker Identification and Labeling #534

KentonParton · 2024-01-15T15:14:59Z

KentonParton
Jan 15, 2024

Hi All,

Deepgrams speech-to-text service with diarization has been invaluable with its remarkable performance and accuracy. However, one feature that would greatly enhance its usefulness for us is the ability to recognize and label specific speakers.

Currently, your system assigns a generic "speaker" label followed by a number. This requires us to manually listen to the recordings, identify each speaker, and then update the speech-to-text output with the names of the speakers.

It would be highly beneficial if we could create a database of speaker profiles (speaker fingerprints). This way, when your system encounters a voice matching one of these profiles, it could automatically apply the custom label (name of speaker).

If Deepgram is open to exploring this enhancement, I'd love to collaborate.

team-deepgram · 2024-01-15T15:15:11Z

team-deepgram
Jan 15, 2024
Maintainer

Thanks for asking your question about Deepgram! If you didn't already include it in your post, please be sure to add as much detail as possible so we can assist you efficiently, such as:

The request_id if you have a question about your requests or transcription responses.
The features you used or the full api.deepgram.com URL you sent your request to, including parameters.
Any code snippets you can share.

0 replies

jkroll-deepgram · 2024-01-16T22:39:20Z

jkroll-deepgram
Jan 16, 2024
Collaborator

Hi @KentonParton, feedback received and I've logged it internally. We've heard interest in this type of feature. For transparency, we're putting more of our focus on cutting-edge ASR models, as well as audio intelligence features (such as summarization, sentiment analysis, intent recognition, topic detection), and text-to-speech. I hope that you'll continue to find value in Deepgram and enjoy our new features over time!

1 reply

KentonParton Jan 17, 2024
Author

Thanks for the transparency @jkroll-deepgram!

everuribe · 2024-02-22T21:19:36Z

everuribe
Feb 22, 2024

@jkroll-deepgram Just wanted to +1 that outputting speaker embeddings / "fingerprints" and exposing an API for historical speaker comparison / identification would be highly beneficial in terms of more effective diarization across recordings

0 replies

bmaluijb · 2024-03-01T19:04:28Z

bmaluijb
Mar 1, 2024

This would be a great feature to have! I would use it.

0 replies

enrique-gm · 2024-03-22T14:51:39Z

enrique-gm
Mar 22, 2024

+1

0 replies

pprobst · 2024-04-24T19:04:53Z

pprobst
Apr 24, 2024

+1

0 replies

avibathula · 2024-04-27T20:23:07Z

avibathula
Apr 27, 2024

Definitely a +1. Thanks @KentonParton for logging this request.

I presume there is some sort of quantifiable/probabilistic voice-signature that the transcription engine is calculating besides the spoken content.

Within a single transcription, given that DeepGram is able to identify and distinguish portions of the spoken content that maps to the same voice-signature and are marking them appropriately. And I would expect voice-signature to be something like an embedding vector with the ability to measure the distance between two such signatures and deduce that they are close enough to be marked as same speaker.

As DeepGram deals with high scale, I wouldn't expect them to store names of people provided on a per account basis and embed those names into transcription response JSON. It might also have privacy implications too, that DeepGram might prefer to not get into.

Instead, if only DeepGram can just stash the speaker voice-signature-embedding and its confidence-rate into the response JSON, we on the clientside can convert that into appropriate names.

Just sharing a few thoughts - let's discuss.

@team-deepgram, @jkroll-deepgram - please please pretty please consider putting some resources onto this request.

0 replies

venusbhatia · 2024-05-06T16:01:00Z

venusbhatia
May 6, 2024

+1, it is actually so well needed! Thanks

0 replies

KentonParton · 2024-06-18T21:37:01Z

KentonParton
Jun 18, 2024
Author

Hi all, (@venusbhatia @avibathula @pprobst @enrique-gm @bmaluijb @everuribe) I've implemented a speaker finger-printing API for a side-project of mine. This is the flow:

Start by generating a Deepgram diarized transcript.
Loop through transcript finding 1 or multiple 5 second clips for each speaker where the Deepgram confidence is highest and is above 90%
I then generate the finger-print which is an embedding.
Search vector store for a matching embedding, if there is no embedding, I insert the embedding with the speakers name
If there are embeddings in the vector store with a confidence above 80%, I return them.

I generally create multiple fingerprints for one speaker from different videos to get the most reliable results.

Is this what everyone is after? If so, I can make it available on my website for purchase. If your use-cases differs, let me know.

1 reply

avibathula Jun 19, 2024

It has been on my back burner, and I was hoping to get to it next month.

I would urge you to consider open sourcing the functionality and providing it as a return-gift to the community and gain street creds.

Pretty please ...

Source : https://gifdb.com/gif/pretty-please-puss-and-boots-eyes-9k98qaajfjbpt7ni.html

gergomiklos · 2024-07-24T21:17:29Z

gergomiklos
Jul 24, 2024

+1. Also, the current live diarization is unusable.

0 replies

holtaila · 2024-07-24T22:05:32Z

holtaila
Jul 24, 2024

0 replies

jpvajda · 2024-08-13T23:28:31Z

jpvajda
Aug 13, 2024
Maintainer

Related: https://github.com/orgs/deepgram/discussions/561

0 replies

fripokoff · 2024-11-13T00:13:40Z

fripokoff
Nov 13, 2024

Hello we need this

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Introducing Custom Speaker Identification and Labeling #534

{{title}}

Replies: 13 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Introducing Custom Speaker Identification and Labeling #534

Replies: 13 comments · 2 replies

team-deepgram Jan 15, 2024 Maintainer

jkroll-deepgram Jan 16, 2024 Collaborator

KentonParton Jan 17, 2024 Author

KentonParton Jun 18, 2024 Author

jpvajda Aug 13, 2024 Maintainer

Replies: 13 comments 2 replies

team-deepgram
Jan 15, 2024
Maintainer

jkroll-deepgram
Jan 16, 2024
Collaborator

KentonParton Jan 17, 2024
Author

KentonParton
Jun 18, 2024
Author

jpvajda
Aug 13, 2024
Maintainer