Keep speaker text together #106

randwvb · 2023-03-23T17:41:04Z

randwvb
Mar 23, 2023

I'm using the tutorial for diarization and I'm able to get output that mimics theirs

example:

[Speaker:0] Hello, and thank you for calling premier phone service. Please be aware that this call may be recorded for quality and training purposes.
[Speaker:0] My name is Beth, and I will be assisting you today. How are you doing?
[Speaker:1] Not too bad. How are you today?
[Speaker:0] I'm doing well. Thank you. May I please have your name?
[Speaker:1] My name is Blake...

I'm new to jq and this, but how would I change jq -r '.results.utterances[] | "[Speaker:\(.speaker)] \(.transcript)"' in order to keep the speaker lines together so it looks more like:

[Speaker:0] Hello, and thank you for calling premier phone service. Please be aware that this call may be recorded for quality and training purposes. My name is Beth, and I will be assisting you today. How are you doing?
[Speaker:1] Not too bad. How are you today?
[Speaker:0] I'm doing well. Thank you. May I please have your name?
[Speaker:1] My name is Blake...

Answered by SandraRodgers

Mar 27, 2023

Hi @randwvb,

There are probably different ways to do this. The current format is building each line off of each utterance. If you look at the response object, you can see that there is an utterances array with objects for each utterance:

  "utterances": [
     {
       "start": 10.345,
       "end": 13.785,
       "confidence": 0.89388895,
       "channel": 0,
       "transcript": "Well, can't remembered I logged in. So I'm already in the room.",
       "words": [ ...
       ],
       "speaker": 0,
       "id": "ff42479c-decb-49fd-b0d2-9a05d519cba8"
     },

Each utterance is created based on natural pauses in the flow of the speaker's speech. They aren't determined by the speaker/diarizat…

View full answer

SandraRodgers · 2023-03-27T21:10:14Z

SandraRodgers
Mar 27, 2023
Collaborator

Hi @randwvb,

There are probably different ways to do this. The current format is building each line off of each utterance. If you look at the response object, you can see that there is an utterances array with objects for each utterance:

  "utterances": [
     {
       "start": 10.345,
       "end": 13.785,
       "confidence": 0.89388895,
       "channel": 0,
       "transcript": "Well, can't remembered I logged in. So I'm already in the room.",
       "words": [ ...
       ],
       "speaker": 0,
       "id": "ff42479c-decb-49fd-b0d2-9a05d519cba8"
     },

Each utterance is created based on natural pauses in the flow of the speaker's speech. They aren't determined by the speaker/diarization. So to build the jq output to show it in the format you want, some extra work will have to be done to the response.

Here is one way to do it:

jq -r '.results.utterances |  ([foreach .[] as $x ({};
    if .prev == $x.speaker then .emit = $x.transcript
    else {prev: $x.speaker, emit: "\n[Speaker:\($x.speaker)] \($x.transcript)"}
    end;
    select(.emit).emit )] | join(" "))'

This code is using foreach to loop through each utterance. It checks if the current item's ($x) speaker property is the same as the previous item's speaker property. If it is, it just returns the transcript. If it is a new speaker property, it returns the transcript with the "Speaker:NUM" string in front of the line. It also adds a \n before that because we'll have the output break the lines based on seeing that \n for newline. Finally, the entire array of strings is concatenated using .join(""). This will create one big string, but because of the \n, it will print the new speaker lines out on a new line when this runs in the terminal.

The best way to learn jq is to play around with it in the jq sandbox. That's how I figured out this answer. You might find a better way to do it! There is documentation at https://stedolan.github.io/jq/.

I hope this was helpful to you!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Keep speaker text together #106

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Deepgram

Keep speaker text together #106

randwvb Mar 23, 2023

Replies: 1 comment

SandraRodgers Mar 27, 2023 Collaborator

randwvb
Mar 23, 2023

SandraRodgers
Mar 27, 2023
Collaborator