-
Notifications
You must be signed in to change notification settings - Fork 8
Refactor chunking logic for lower-latency realtime audio streaming #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mgupta-soundhound
wants to merge
1
commit into
master
Choose a base branch
from
mgupta/reduce_audio_packetization
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,16 @@ | ||
| module github.com/soundhound/houndify-sdk-go | ||
|
|
||
| go 1.12 | ||
| go 1.26 | ||
|
|
||
| require ( | ||
| github.com/go-audio/wav v1.0.0 | ||
| github.com/google/go-cmp v0.3.0 // indirect | ||
| github.com/pkg/errors v0.8.1 | ||
| gotest.tools v2.2.0+incompatible | ||
| github.com/go-audio/wav v1.1.0 | ||
| github.com/stretchr/testify v1.11.1 | ||
| ) | ||
|
|
||
| require ( | ||
| github.com/davecgh/go-spew v1.1.1 // indirect | ||
| github.com/go-audio/audio v1.0.0 // indirect | ||
| github.com/go-audio/riff v1.0.0 // indirect | ||
| github.com/pmezard/go-difflib v1.0.0 // indirect | ||
| gopkg.in/yaml.v3 v3.0.1 // indirect | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,16 @@ | ||
| github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= | ||
| github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= | ||
| github.com/go-audio/audio v1.0.0 h1:zS9vebldgbQqktK4H0lUqWrG8P0NxCJVqcj7ZpNnwd4= | ||
| github.com/go-audio/audio v1.0.0/go.mod h1:6uAu0+H2lHkwdGsAY+j2wHPNPpPoeg5AaEFh9FlA+Zs= | ||
| github.com/go-audio/riff v1.0.0 h1:d8iCGbDvox9BfLagY94fBynxSPHO80LmZCaOsmKxokA= | ||
| github.com/go-audio/riff v1.0.0/go.mod h1:l3cQwc85y79NQFCRB7TiPoNiaijp6q8Z0Uv38rVG498= | ||
| github.com/go-audio/wav v1.0.0 h1:WdSGLhtyud6bof6XHL28xKeCQRzCV06pOFo3LZsFdyE= | ||
| github.com/go-audio/wav v1.0.0/go.mod h1:3yoReyQOsiARkvPl3ERCi8JFjihzG6WhjYpZCf5zAWE= | ||
| github.com/google/go-cmp v0.3.0 h1:crn/baboCvb5fXaQ0IJ1SGTsTVrWpDsCWC8EGETZijY= | ||
| github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU= | ||
| github.com/pkg/errors v0.8.1 h1:iURUrRGxPUNPdy5/HRSm+Yj6okJ6UtLINN0Q9M4+h3I= | ||
| github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= | ||
| gotest.tools v2.2.0+incompatible h1:VsBPFP1AI068pPrMxtb/S8Zkgf9xEmTLJjfM+P5UIEo= | ||
| gotest.tools v2.2.0+incompatible/go.mod h1:DsYFclhRJ6vuDpmuTbkuFWG+y2sxOXAzmJt81HFBacw= | ||
| github.com/go-audio/wav v1.1.0 h1:jQgLtbqBzY7G+BM8fXF7AHUk1uHUviWS4X39d5rsL2g= | ||
| github.com/go-audio/wav v1.1.0/go.mod h1:mpe9qfwbScEbkd8uybLuIpTgHyrISw/OTuvjUW2iGtE= | ||
| github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= | ||
| github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= | ||
| github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U= | ||
| github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U= | ||
| gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM= | ||
| gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= | ||
| gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= | ||
| gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,125 @@ | ||
| package houndify | ||
|
|
||
| import ( | ||
| "fmt" | ||
| "time" | ||
| ) | ||
|
|
||
| type LPCMStreamInfo struct { | ||
| numChans int | ||
| bitDepth int | ||
| sampleRate int | ||
| targetStreamingIntervalMs int | ||
|
|
||
| // Calculated output fields: | ||
| idealChunkSize int // raw, unaligned byte count for the target streaming interval. | ||
|
|
||
| // The client should stream one chunk of chunkSize bytes every streamingInterval. | ||
| chunkSize int // frame-aligned byte count that should be streamed each tick. | ||
| streamingInterval time.Duration // the duration represented by chunkSize | ||
| } | ||
|
|
||
| func (info *LPCMStreamInfo) NumChans() int { | ||
| return info.numChans | ||
| } | ||
|
|
||
| func (info *LPCMStreamInfo) BitDepth() int { | ||
| return info.bitDepth | ||
| } | ||
|
|
||
| func (info *LPCMStreamInfo) SampleRate() int { | ||
| return info.sampleRate | ||
| } | ||
|
|
||
| func (info *LPCMStreamInfo) TargetStreamingIntervalMs() int { | ||
| return info.targetStreamingIntervalMs | ||
| } | ||
|
|
||
| func (info *LPCMStreamInfo) IdealChunkSize() int { | ||
| return info.idealChunkSize | ||
| } | ||
|
|
||
| func (info *LPCMStreamInfo) ChunkSize() int { | ||
| return info.chunkSize | ||
| } | ||
|
|
||
| func (info *LPCMStreamInfo) StreamingInterval() time.Duration { | ||
| return info.streamingInterval | ||
| } | ||
|
|
||
| // GetLPCMStreamInfo computes the appropriate chunk size and streaming interval | ||
| // for streaming linear PCM audio data. | ||
| // | ||
| // It takes audio parameters (number of channels, bit depth, target streaming interval in | ||
| // milliseconds, and sample rate) and calculates the frame-aligned chunk size and the actual | ||
| // streaming interval implied by that aligned chunk size. The resulting interval is chosen | ||
| // to be as close as possible to the requested target and will often match it exactly. | ||
| // | ||
| // The function performs three steps: | ||
| // 1. Calculates the ideal number of bytes needed to represent the target streaming interval | ||
| // 2. Aligns the byte count to full audio frames to ensure valid audio boundaries | ||
| // 3. Derives the actual streaming interval from the aligned byte count | ||
| // | ||
| // Parameters: | ||
| // - numChans: Number of audio channels | ||
| // - bitDepth: Bit depth of each audio sample (e.g., 8, 16, 24, 32) | ||
| // - sampleRate: Sample rate in Hz (e.g., 16000, 44100) | ||
| // - targetStreamingIntervalMs: Target streaming interval in milliseconds | ||
| // | ||
| // Returns an LPCMStreamInfo containing both the source audio metadata and the | ||
| // calculated streaming values: | ||
| // - numChans: the number of audio channels in the source stream | ||
| // - bitDepth: the bits per sample for each channel | ||
| // - sampleRate: the sampling rate in Hz | ||
| // - targetStreamingIntervalMs: the requested streaming cadence in milliseconds | ||
| // - idealChunkSize: the exact byte count for the requested interval before frame alignment | ||
| // - chunkSize: the frame-aligned byte count to write for each chunk | ||
| // - streamingInterval: the duration represented by actualChunkSize | ||
| func GetLPCMStreamInfo( | ||
| numChans int, | ||
| bitDepth int, | ||
| sampleRate int, | ||
| targetStreamingIntervalMs int, | ||
| ) (*LPCMStreamInfo, error) { | ||
|
|
||
| if numChans < 1 { | ||
| return nil, | ||
| fmt.Errorf("invalid input: numChans must be >= 1, got %d", numChans) | ||
| } | ||
| if bitDepth < 8 || (bitDepth%8 != 0) { | ||
| return nil, | ||
| fmt.Errorf("invalid input: bitDepth must be >= 8 and multiple of 8, got %d", bitDepth) | ||
| } | ||
| if sampleRate < 8000 { | ||
| return nil, | ||
| fmt.Errorf("invalid input: sampleRate must be >= 8000, got %d", sampleRate) | ||
| } | ||
| if targetStreamingIntervalMs < 1 { | ||
| return nil, | ||
| fmt.Errorf("invalid input: targetStreamingIntervalMs must be >= 1, got %d", | ||
| targetStreamingIntervalMs) | ||
| } | ||
|
|
||
| bytesPerFrame := numChans * (bitDepth / 8) | ||
| bytesPerSecond := sampleRate * bytesPerFrame | ||
|
|
||
| // Step 1: ideal (non-aligned) byte size | ||
| idealChunkSize := (bytesPerSecond * targetStreamingIntervalMs) / 1000 | ||
|
|
||
| // Step 2: align to full frames | ||
| chunkSize := (idealChunkSize / bytesPerFrame) * bytesPerFrame | ||
|
|
||
| // Step 3: derive the actual streaming interval from bytes | ||
| streamingInterval := (time.Duration(chunkSize) * time.Second) / time.Duration(bytesPerSecond) | ||
|
|
||
| return &LPCMStreamInfo{ | ||
| numChans: numChans, | ||
| bitDepth: bitDepth, | ||
| sampleRate: sampleRate, | ||
| targetStreamingIntervalMs: targetStreamingIntervalMs, | ||
|
|
||
| idealChunkSize: idealChunkSize, | ||
| chunkSize: chunkSize, | ||
| streamingInterval: streamingInterval, | ||
| }, nil | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this file into
examplefolder to indicate this is not part of the core SDK but some helper functions used in the client code?We could keep it here if we want add some simpler API to allow SDK users to stream with a certain interval in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m intentionally including this in the main SDK so clients can directly configure streaming chunking and duration for LPCM audio formats. This makes it a first-class part of the SDK, available to anyone who needs it. If it lived under
github.com/soundhound/houndify-sdk-go/example, clients would have to either copy the code each time or import an example package, which isn’t ideal for production use. As it stands, they can simply importgithub.com/soundhound/houndify-sdk-goand use it out of the box.