Significant audio delay between call to SDL_PutAudioStreamData and the actual output (Android)

I'm having some audio issues in my SDL3 project on Android (13) and Windows (11):

There's a significant delay between my call to `SDL_PutAudioStreamData` and what's actually output to the speakers (up to ~500 ms sometimes). That's about 24000 samples of delay, as I'm feeding 48 kHz input, and that amount of samples doesn't seem to be a common buffer size.

I observed that by blinking an image on the rendering thread, which is vsynced and runs at 144 fps in my computer display (6.94 ms of frame period) and 120 fps in my phone display (8.33 ms of frame period), at pratically the same time I call `SDL_PutAudioStreamData`, and the delay seems constant. Then I tried delaying the blinking by 500 ms and it seemed more in sync with the audio.

The problem is more serious on Android than Windows, on the other hand, I tried the same code (without the delay compensation on the rendering thread) on Linux (Pop!_OS 22.04) and this issue doesn't happen, or at least I have way less of an audio delay that I could notice.

Calling `SDL_FlushAudioStream` after `SDL_PutAudioStreamData` didn't resolve the issue, and is actually undesirable since I want to ensure the timing between the samples I play with sample precision (by padding with silence when necessary), and by calling `SDL_FlushAudioStream` with its auto-padding:

> there may be audio gaps in the output

Is there a way to shorten the delay between the time I feed samples to the `SDL_AUDIO_DEVICE_DEFAULT_PLAYBACK` and the actual time it outputs to be unnoticeable, by calling `SDL_PutAudioStreamData` or maybe by doing something else different? (Maybe set the buffer size?)

If not, can I at least query what that delay is for the current system? (query the buffer size? will it matter in AudioStream interface?)

I tried to put up a minimal example that's also easy to verify. It's an aplication that outputs a tic (orange) tac (blue) sound while blinking the screen with the respective color, it has 3 modes: 1, 0.5 and 0.1 second period between sounds that you can switch between by clicking/touching the screen:

```c++
#include <SDL3/SDL.h>
#include <SDL3/SDL_main.h>

#include <array>
#include <chrono>
#include <condition_variable>
#include <limits>
#include <mutex>
#include <thread>
#include <type_traits>
#include <vector>

#include <cmath>
#include <cstring>

namespace {

enum class Platform : uint8_t {
    ANDROID_PLATFORM,
    LINUX_PLATFORM,
    WINDOWS_PLATFORM,
} constinit const currentPlatform
#if defined(__ANDROID__)
    = Platform::ANDROID_PLATFORM;
#elif defined(__linux__)
    = Platform::LINUX_PLATFORM;
#elif defined(_WIN32)
    = Platform::WINDOWS_PLATFORM;
#else
    #error "Platform not supported"
#endif

constexpr auto maxErrorMessageLen = 256;
template<typename... Args>
void messageError(const char* const fmt, Args&&... args) {
    std::array<char, maxErrorMessageLen> msg{};
    std::snprintf(msg.data(), msg.size() * sizeof(decltype(msg)::value_type), fmt, std::forward<Args>(args)...);
    SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR, "Error", msg.data(), nullptr);
}

} // namespace

int main(int /*argc*/, char* /*argv*/[]) {
    // Initialization
    if (!SDL_Init(SDL_INIT_AUDIO | SDL_INIT_VIDEO)) [[unlikely]] {
        messageError("SDL_Init failed (%s)", SDL_GetError());
        return 1;
    }

    int numDisplays = 0;

    SDL_DisplayID* const displaysPtr = SDL_GetDisplays(&numDisplays);
    if (displaysPtr == nullptr) [[unlikely]] {
        messageError("SDL_GetDisplays failed (%s)", SDL_GetError());
        return 1;
    }

    const SDL_DisplayMode* displayMode = SDL_GetCurrentDisplayMode(displaysPtr[0]);
    SDL_free(displaysPtr);
    if (displayMode == nullptr) [[unlikely]] {
        messageError("SDL_GetCurrentDisplayMode failed (%s)", SDL_GetError());
        return 1;
    }

    constexpr int desktopWindowWidth  = 960;
    constexpr int desktopWindowHeight = 540;

    SDL_Window*   window   = nullptr;
    SDL_Renderer* renderer = nullptr;
    if (!SDL_CreateWindowAndRenderer(
            "SDLAudioDelay",
            currentPlatform == Platform::ANDROID_PLATFORM ? displayMode->w : desktopWindowWidth,
            currentPlatform == Platform::ANDROID_PLATFORM ? displayMode->h : desktopWindowHeight,
            SDL_WINDOW_OPENGL,
            &window,
            &renderer)) [[unlikely]] {
        messageError("SDL_CreateWindowAndRenderer failed (%s)", SDL_GetError());
        return 1;
    }

    if (!SDL_SetRenderDrawBlendMode(renderer, SDL_BLENDMODE_BLEND)) [[unlikely]] {
        messageError("SDL_SetRenderDrawBlendMode failed (%s)", SDL_GetError());
        SDL_DestroyRenderer(renderer);
        SDL_DestroyWindow(window);
        return 1;
    }

    if (!SDL_SetRenderVSync(renderer, 1)) [[unlikely]] {
        messageError("SDL_SetRenderVSync failed (%s)", SDL_GetError());
        SDL_DestroyRenderer(renderer);
        SDL_DestroyWindow(window);
        return 1;
    }

    using namespace std::chrono_literals;

    constexpr SDL_AudioSpec spec = {.format = SDL_AUDIO_S16, .channels = 1, .freq = 48000};

    constexpr auto ticTacDurationSeconds = 0.6;
    constexpr auto ticTacNumSamples      = size_t(spec.freq * ticTacDurationSeconds);

    constexpr auto ticFreq = 1760.0;
    constexpr auto tacFreq = ticFreq / 2.0;

    std::vector<int16_t> ticSamples(ticTacNumSamples);
    std::vector<int16_t> tacSamples(ticTacNumSamples);

    const auto fillSamples = [](auto& samples, const auto freq) {
        using val_t = typename std::decay_t<decltype(samples)>::value_type;

        constexpr auto pi = 3.141592653589793238463;

        auto amplitude = 1.0;

        for (size_t i = 0; i < samples.size(); ++i) {
            samples[i] = val_t(amplitude * std::sin(2.0 * pi * freq * double(i) / double(spec.freq))
                               * std::numeric_limits<val_t>::max());

            amplitude *= 0.99;
        }
    };

    fillSamples(ticSamples, ticFreq);
    fillSamples(tacSamples, tacFreq);

    enum class Sound : uint8_t { TIC, TAC } currentPlayedSound = Sound::TAC;

    const std::vector<uint8_t> silenceSamples(1024);

    SDL_AudioStream* stream = SDL_OpenAudioDeviceStream(SDL_AUDIO_DEVICE_DEFAULT_PLAYBACK, &spec, nullptr, nullptr);
    if (stream == nullptr) [[unlikely]] {
        messageError("SDL_OpenAudioDeviceStream failed (%s)", SDL_GetError());
        SDL_DestroyRenderer(renderer);
        SDL_DestroyWindow(window);
        return 1;
    }

    if (!SDL_ResumeAudioStreamDevice(stream)) [[unlikely]] {
        messageError("SDL_ResumeAudioStreamDevice failed (%s)", SDL_GetError());
        SDL_DestroyAudioStream(stream);
        SDL_DestroyRenderer(renderer);
        SDL_DestroyWindow(window);
        return 1;
    }

    using clock_t = std::chrono::steady_clock;

    constexpr std::array timeIntervals = {
        1'000'000'000ns, // 1.0 second
        500'000'000ns,   // 0.5 second
        100'000'000ns,   // 0.1 second
    };
    uint8_t currentTimeIntervalIdx = 0;

    constexpr auto          waitError  = 2ms;
    constexpr auto          startDelay = 50ms;
    bool                    wait       = true;
    auto                    waitTp     = clock_t::time_point::max();
    std::mutex              waitMutex;
    std::condition_variable waitCv;

    auto lastTicTacTime = clock_t::time_point::min();

    // Run
    bool running = true;

    int ret = 0;

    std::thread periodicSoundThread{[&] {
        std::unique_lock lock(waitMutex);

        while (true) {
            waitCv.wait_until(lock, waitTp - waitError, [&wait] { return !wait; });
            wait = true;
            if (!running) [[unlikely]] {
                break;
            }
            while ((lastTicTacTime = clock_t::now()) < waitTp) [[likely]] {}
            const auto interval = timeIntervals[currentTimeIntervalIdx];

            // Feed samples to the playback device
            {
                currentPlayedSound = currentPlayedSound == Sound::TIC ? Sound::TAC : Sound::TIC;

                const auto [format, channels, freq] = spec;

                const int ticTacNumBytes   = int(ticTacNumSamples * SDL_AUDIO_BYTESIZE(format));
                const int intervalNumBytes = int(interval.count() * freq * channels / 1'000'000'000
                                                 * SDL_AUDIO_BYTESIZE(format));

                const auto numTicTacBytesToPlay = std::min(ticTacNumBytes, intervalNumBytes);

                const auto& samples = currentPlayedSound == Sound::TIC ? ticSamples : tacSamples;

                if (!SDL_PutAudioStreamData(stream, samples.data(), numTicTacBytesToPlay)) [[unlikely]] {
                    messageError("SDL_PutAudioStreamData failed (%s)", SDL_GetError());
                    running = false;
                    ret     = 1;
                    return;
                }

                // And pad with silence if necessary
                int numBytesInSilence = intervalNumBytes - ticTacNumBytes;
                while (numBytesInSilence > int(silenceSamples.size())) {
                    if (!SDL_PutAudioStreamData(stream, silenceSamples.data(), int(silenceSamples.size())))
                        [[unlikely]] {
                        messageError("SDL_PutAudioStreamData failed (%s)", SDL_GetError());
                        running = false;
                        ret     = 1;
                        return;
                    }
                    numBytesInSilence -= int(silenceSamples.size());
                }
                if (numBytesInSilence > 0) {
                    if (!SDL_PutAudioStreamData(stream, silenceSamples.data(), numBytesInSilence)) [[unlikely]] {
                        messageError("SDL_PutAudioStreamData failed (%s)", SDL_GetError());
                        running = false;
                        ret     = 1;
                        return;
                    }
                }
            }

            waitTp += interval;
        }
    }};

    const auto startPlaying = [&] {
        {
            std::lock_guard lock(waitMutex);
            waitTp = clock_t::now() + startDelay;
            wait   = false;
        }
        waitCv.notify_one();
    };

    // Just to make sure periodicSoundThread is ready
    std::this_thread::sleep_for(1s);

    startPlaying();

    while (true) {
        for (SDL_Event event; running && SDL_PollEvent(&event);) {
            switch (event.type) {
            case SDL_EVENT_QUIT: {
                running = false;
                break;
            }
            case SDL_EVENT_MOUSE_BUTTON_DOWN: {
                currentTimeIntervalIdx = (currentTimeIntervalIdx + 1) % timeIntervals.size();
                if (!SDL_ClearAudioStream(stream)) {
                    messageError("SDL_ClearAudioStream failed (%s)", SDL_GetError());
                    running = false;
                    ret     = 1;
                    continue;
                }
                startPlaying();
                break;
            }
            default: break;
            }
        }
        if (!running) {
            {
                std::lock_guard lock(waitMutex);
                wait = false;
            }
            waitCv.notify_one();

            break;
        }

        using duration_t = std::chrono::nanoseconds;

        constexpr duration_t blinkDecayTime = 100'000'000ns; // 100 ms

        const auto elapsedTimeSinceLastBeat = std::chrono::duration_cast<duration_t>(clock_t::now() - lastTicTacTime);

        double colorItensity = 0.0;
        if (elapsedTimeSinceLastBeat > 0ns && elapsedTimeSinceLastBeat < blinkDecayTime) {
            colorItensity = double(blinkDecayTime.count() - elapsedTimeSinceLastBeat.count()) / blinkDecayTime.count();
        }

        if (!SDL_SetRenderDrawColor(renderer,
                                    currentPlayedSound == Sound::TIC ? uint8_t(0XFF * colorItensity) : 0X00,
                                    uint8_t(0X7F * colorItensity),
                                    currentPlayedSound == Sound::TAC ? uint8_t(0XFF * colorItensity) : 0X00,
                                    0XFF)) [[unlikely]] {
            messageError("SDL_SetRenderDrawColor failed (%s)", SDL_GetError());
            running = false;
            ret     = 1;
            continue;
        }

        if (!SDL_RenderClear(renderer)) [[unlikely]] {
            messageError("SDL_RenderClear failed (%s)", SDL_GetError());
            running = false;
            ret     = 1;
            continue;
        }

        if (!SDL_RenderPresent(renderer)) [[unlikely]] {
            messageError("SDL_RenderPresent failed (%s)", SDL_GetError());
            running = false;
            ret     = 1;
            continue;
        }
    }

    // End
    periodicSoundThread.join();

    SDL_DestroyAudioStream(stream);
    SDL_DestroyRenderer(renderer);
    SDL_DestroyWindow(window);

    SDL_Quit();

    return ret;
}
```

Observe that when we get near a 500 ms delay, the tic and tac swap colors on the second interval mode (tic misses the orarnge blink and plays on blue, and vice versa).

Sometimes the delay is not as huge as 500 ms, but you can still verify it by closing you eyes and opening when you hear the tic/tac sound (yes, my application requires that precision in synchronization between audio and video). On Linux, where the delay is imperceptible, you can easly still see the color on the screen, while on Android/Windows most of the time you'll face the black screen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Significant audio delay between call to SDL_PutAudioStreamData and the actual output (Android) #12012

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Significant audio delay between call to SDL_PutAudioStreamData and the actual output (Android) #12012

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions