Skip to content

Refactor chunking logic for lower-latency realtime audio streaming#22

Open
mgupta-soundhound wants to merge 1 commit into
masterfrom
mgupta/reduce_audio_packetization
Open

Refactor chunking logic for lower-latency realtime audio streaming#22
mgupta-soundhound wants to merge 1 commit into
masterfrom
mgupta/reduce_audio_packetization

Conversation

@mgupta-soundhound
Copy link
Copy Markdown
Collaborator

@mgupta-soundhound mgupta-soundhound commented Mar 20, 2026

Problem(s)

  • The realtime streaming example currently sends LPCM audio in fixed 1-second chunks. This doesn’t reflect real realtime behaviour and introduces unnecessary latency before partial transcripts and SafeToStopAudio signals are received.
  • The chunking logic is embedded in the example and assumes a wav format, making it harder to reuse, validate, or modify for different LPCM formats.
  • Using such large chunks also increases the chance that the client is still writing the final chunk after the server has already determined it has enough audio, has processed the query and closed the connection. In this case, the client can attempt to write to a closed socket and encounter a TCP reset.

Solution Summary

  • Improve realtime LPCM streaming packetization by moving chunk-size calculation into a reusable helper, tightening input validation, and adding unit test coverage for chunk sizing across common sample rates.

What changed

  • Reduce streamed audio packetization from 1 second to smaller realtime intervals for faster partial transcripts and SafeToStopAudio handling.
  • Add GetLPCMStreamInfo to centralize LPCM chunk-size and streaming-interval calculation.
  • Validate LPCM inputs more strictly (numChans, bitDepth, sampleRate, and targetStreamIntervalMs).
  • Refactor the example streamer to use the shared LPCM stream info helper.
  • Add isolated table-driven tests covering expected chunk sizes and streaming intervals for multiple sample rates and intervals, plus invalid-input cases.
  • Replaced the timer.Sleep() to ticker in audio streaming example - to avoid any drifts over time. This change will also help prevent writing any chunks after SafeToStopAudio has been received.
  • Other upgrades:
    • Update go version to 1.26
    • Remove usage of deprecated io/ioutil
    • Remove usage of deprecated github.com/pkg/errors
    • Replace gotest.tools/assert with github.com/stretchr/testify/assert

Why

  • Keeps realtime streaming pacing consistent with frame-aligned LPCM audio.
  • Makes the chunking logic reusable and easier to reason about outside the example.
  • Adds regression coverage for sample rates where chunk math is easy to get wrong.

LPCM Streaming Calculations
The helper calculates streaming info in three steps:

  • Compute the ideal byte count for the requested streaming interval using sample rate, channel count, and bit depth.
  • Align that byte count to full LPCM audio frames so chunks never split samples.
  • Derive the actual streaming interval represented by the aligned chunk size, so pacing stays consistent with the bytes being sent.
  • This keeps the example’s realtime stream frame-aligned and minimizes drift between transmitted audio data and wall-clock timing.

Testing

  • Added lpcm_stream_info_test.go coverage for valid and invalid cases.
  • Manually tested the example.go code:
% ./example -voice ../test_audio/what_is_the_weather_like_in_toronto.wav -stream 
what
what
what is
what is the
what is the
what is the weather
what is the weather
what is the weather
what is the weather like
what is the weather like
what is the weather like in
what is the weather like in
what is the weather like in tur
what is the weather like in toronto
what is the weather like in toronto
what is the weather like in toronto
what is the weather like in toronto
what is the weather like in toronto
what is the weather like in toronto
Reached end of file
what is the weather like in toronto
what is the weather like in toronto
what is the weather like in toronto
The weather is 38 °F and raining in Toronto, Canada.

Follow-up(s)

  • As a follow-up, we should consider removing the current io.Pipe reader/writer pattern from the example. In Go, io.Pipe behaves like a buffer-less channel: writes block until the reader is ready to consume the data. That means the writer’s pacing is coupled to the request body reader, which can distort the intended realtime streaming cadence. Replacing this pattern with a simple buffered approach would decouple writes from reads, making the streaming example non-blocking.

@mgupta-soundhound mgupta-soundhound changed the title Reduce chunk size from 1s to 20ms for audio streaming [Draft] Reduce chunk size from 1s to 20ms for audio streaming Mar 20, 2026
@mgupta-soundhound mgupta-soundhound force-pushed the mgupta/reduce_audio_packetization branch 2 times, most recently from c84e205 to c10c6cd Compare March 20, 2026 23:30
@mgupta-soundhound mgupta-soundhound changed the title [Draft] Reduce chunk size from 1s to 20ms for audio streaming Refactor chunking logic for lower-latency realtime audio streaming Mar 20, 2026
@mgupta-soundhound mgupta-soundhound force-pushed the mgupta/reduce_audio_packetization branch 5 times, most recently from e70d0dc to a92c6b4 Compare March 22, 2026 14:54
@mgupta-soundhound mgupta-soundhound self-assigned this Mar 22, 2026
@mgupta-soundhound mgupta-soundhound force-pushed the mgupta/reduce_audio_packetization branch 4 times, most recently from 86b1e3d to 8d6e5fe Compare March 22, 2026 16:13
@mgupta-soundhound mgupta-soundhound force-pushed the mgupta/reduce_audio_packetization branch from 8d6e5fe to 4620d42 Compare March 22, 2026 16:15
Comment thread lpcm_stream_info.go
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this file into example folder to indicate this is not part of the core SDK but some helper functions used in the client code?

We could keep it here if we want add some simpler API to allow SDK users to stream with a certain interval in the future.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m intentionally including this in the main SDK so clients can directly configure streaming chunking and duration for LPCM audio formats. This makes it a first-class part of the SDK, available to anyone who needs it. If it lived under github.com/soundhound/houndify-sdk-go/example, clients would have to either copy the code each time or import an example package, which isn’t ideal for production use. As it stands, they can simply import github.com/soundhound/houndify-sdk-go and use it out of the box.

@zhili-soundhound
Copy link
Copy Markdown

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants