Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant audio delay between call to SDL_PutAudioStreamData and the actual output (Android) #12012

Open
NicolasFirmo opened this issue Jan 18, 2025 · 4 comments
Assignees
Milestone

Comments

@NicolasFirmo
Copy link
Contributor

I'm having some audio issues in my SDL3 project on Android (13) and Windows (11):

There's a significant delay between my call to SDL_PutAudioStreamData and what's actually output to the speakers (up to ~500 ms sometimes). That's about 24000 samples of delay, as I'm feeding 48 kHz input, and that amount of samples doesn't seem to be a common buffer size.

I observed that by blinking an image on the rendering thread, which is vsynced and runs at 144 fps in my computer display (6.94 ms of frame period) and 120 fps in my phone display (8.33 ms of frame period), at pratically the same time I call SDL_PutAudioStreamData, and the delay seems constant. Then I tried delaying the blinking by 500 ms and it seemed more in sync with the audio.

The problem is more serious on Android than Windows, on the other hand, I tried the same code (without the delay compensation on the rendering thread) on Linux (Pop!_OS 22.04) and this issue doesn't happen, or at least I have way less of an audio delay that I could notice.

Calling SDL_FlushAudioStream after SDL_PutAudioStreamData didn't resolve the issue, and is actually undesirable since I want to ensure the timing between the samples I play with sample precision (by padding with silence when necessary), and by calling SDL_FlushAudioStream with its auto-padding:

there may be audio gaps in the output

Is there a way to shorten the delay between the time I feed samples to the SDL_AUDIO_DEVICE_DEFAULT_PLAYBACK and the actual time it outputs to be unnoticeable, by calling SDL_PutAudioStreamData or maybe by doing something else different? (Maybe set the buffer size?)

If not, can I at least query what that delay is for the current system? (query the buffer size? will it matter in AudioStream interface?)

I tried to put up a minimal example that's also easy to verify. It's an aplication that outputs a tic (orange) tac (blue) sound while blinking the screen with the respective color, it has 3 modes: 1, 0.5 and 0.1 second period between sounds that you can switch between by clicking/touching the screen:

#include <SDL3/SDL.h>
#include <SDL3/SDL_main.h>

#include <array>
#include <chrono>
#include <condition_variable>
#include <limits>
#include <mutex>
#include <thread>
#include <type_traits>
#include <vector>

#include <cmath>
#include <cstring>

namespace {

enum class Platform : uint8_t {
    ANDROID_PLATFORM,
    LINUX_PLATFORM,
    WINDOWS_PLATFORM,
} constinit const currentPlatform
#if defined(__ANDROID__)
    = Platform::ANDROID_PLATFORM;
#elif defined(__linux__)
    = Platform::LINUX_PLATFORM;
#elif defined(_WIN32)
    = Platform::WINDOWS_PLATFORM;
#else
    #error "Platform not supported"
#endif

constexpr auto maxErrorMessageLen = 256;
template<typename... Args>
void messageError(const char* const fmt, Args&&... args) {
    std::array<char, maxErrorMessageLen> msg{};
    std::snprintf(msg.data(), msg.size() * sizeof(decltype(msg)::value_type), fmt, std::forward<Args>(args)...);
    SDL_ShowSimpleMessageBox(SDL_MESSAGEBOX_ERROR, "Error", msg.data(), nullptr);
}

} // namespace

int main(int /*argc*/, char* /*argv*/[]) {
    // Initialization
    if (!SDL_Init(SDL_INIT_AUDIO | SDL_INIT_VIDEO)) [[unlikely]] {
        messageError("SDL_Init failed (%s)", SDL_GetError());
        return 1;
    }

    int numDisplays = 0;

    SDL_DisplayID* const displaysPtr = SDL_GetDisplays(&numDisplays);
    if (displaysPtr == nullptr) [[unlikely]] {
        messageError("SDL_GetDisplays failed (%s)", SDL_GetError());
        return 1;
    }

    const SDL_DisplayMode* displayMode = SDL_GetCurrentDisplayMode(displaysPtr[0]);
    SDL_free(displaysPtr);
    if (displayMode == nullptr) [[unlikely]] {
        messageError("SDL_GetCurrentDisplayMode failed (%s)", SDL_GetError());
        return 1;
    }

    constexpr int desktopWindowWidth  = 960;
    constexpr int desktopWindowHeight = 540;

    SDL_Window*   window   = nullptr;
    SDL_Renderer* renderer = nullptr;
    if (!SDL_CreateWindowAndRenderer(
            "SDLAudioDelay",
            currentPlatform == Platform::ANDROID_PLATFORM ? displayMode->w : desktopWindowWidth,
            currentPlatform == Platform::ANDROID_PLATFORM ? displayMode->h : desktopWindowHeight,
            SDL_WINDOW_OPENGL,
            &window,
            &renderer)) [[unlikely]] {
        messageError("SDL_CreateWindowAndRenderer failed (%s)", SDL_GetError());
        return 1;
    }

    if (!SDL_SetRenderDrawBlendMode(renderer, SDL_BLENDMODE_BLEND)) [[unlikely]] {
        messageError("SDL_SetRenderDrawBlendMode failed (%s)", SDL_GetError());
        SDL_DestroyRenderer(renderer);
        SDL_DestroyWindow(window);
        return 1;
    }

    if (!SDL_SetRenderVSync(renderer, 1)) [[unlikely]] {
        messageError("SDL_SetRenderVSync failed (%s)", SDL_GetError());
        SDL_DestroyRenderer(renderer);
        SDL_DestroyWindow(window);
        return 1;
    }

    using namespace std::chrono_literals;

    constexpr SDL_AudioSpec spec = {.format = SDL_AUDIO_S16, .channels = 1, .freq = 48000};

    constexpr auto ticTacDurationSeconds = 0.6;
    constexpr auto ticTacNumSamples      = size_t(spec.freq * ticTacDurationSeconds);

    constexpr auto ticFreq = 1760.0;
    constexpr auto tacFreq = ticFreq / 2.0;

    std::vector<int16_t> ticSamples(ticTacNumSamples);
    std::vector<int16_t> tacSamples(ticTacNumSamples);

    const auto fillSamples = [](auto& samples, const auto freq) {
        using val_t = typename std::decay_t<decltype(samples)>::value_type;

        constexpr auto pi = 3.141592653589793238463;

        auto amplitude = 1.0;

        for (size_t i = 0; i < samples.size(); ++i) {
            samples[i] = val_t(amplitude * std::sin(2.0 * pi * freq * double(i) / double(spec.freq))
                               * std::numeric_limits<val_t>::max());

            amplitude *= 0.99;
        }
    };

    fillSamples(ticSamples, ticFreq);
    fillSamples(tacSamples, tacFreq);

    enum class Sound : uint8_t { TIC, TAC } currentPlayedSound = Sound::TAC;

    const std::vector<uint8_t> silenceSamples(1024);

    SDL_AudioStream* stream = SDL_OpenAudioDeviceStream(SDL_AUDIO_DEVICE_DEFAULT_PLAYBACK, &spec, nullptr, nullptr);
    if (stream == nullptr) [[unlikely]] {
        messageError("SDL_OpenAudioDeviceStream failed (%s)", SDL_GetError());
        SDL_DestroyRenderer(renderer);
        SDL_DestroyWindow(window);
        return 1;
    }

    if (!SDL_ResumeAudioStreamDevice(stream)) [[unlikely]] {
        messageError("SDL_ResumeAudioStreamDevice failed (%s)", SDL_GetError());
        SDL_DestroyAudioStream(stream);
        SDL_DestroyRenderer(renderer);
        SDL_DestroyWindow(window);
        return 1;
    }

    using clock_t = std::chrono::steady_clock;

    constexpr std::array timeIntervals = {
        1'000'000'000ns, // 1.0 second
        500'000'000ns,   // 0.5 second
        100'000'000ns,   // 0.1 second
    };
    uint8_t currentTimeIntervalIdx = 0;

    constexpr auto          waitError  = 2ms;
    constexpr auto          startDelay = 50ms;
    bool                    wait       = true;
    auto                    waitTp     = clock_t::time_point::max();
    std::mutex              waitMutex;
    std::condition_variable waitCv;

    auto lastTicTacTime = clock_t::time_point::min();

    // Run
    bool running = true;

    int ret = 0;

    std::thread periodicSoundThread{[&] {
        std::unique_lock lock(waitMutex);

        while (true) {
            waitCv.wait_until(lock, waitTp - waitError, [&wait] { return !wait; });
            wait = true;
            if (!running) [[unlikely]] {
                break;
            }
            while ((lastTicTacTime = clock_t::now()) < waitTp) [[likely]] {}
            const auto interval = timeIntervals[currentTimeIntervalIdx];

            // Feed samples to the playback device
            {
                currentPlayedSound = currentPlayedSound == Sound::TIC ? Sound::TAC : Sound::TIC;

                const auto [format, channels, freq] = spec;

                const int ticTacNumBytes   = int(ticTacNumSamples * SDL_AUDIO_BYTESIZE(format));
                const int intervalNumBytes = int(interval.count() * freq * channels / 1'000'000'000
                                                 * SDL_AUDIO_BYTESIZE(format));

                const auto numTicTacBytesToPlay = std::min(ticTacNumBytes, intervalNumBytes);

                const auto& samples = currentPlayedSound == Sound::TIC ? ticSamples : tacSamples;

                if (!SDL_PutAudioStreamData(stream, samples.data(), numTicTacBytesToPlay)) [[unlikely]] {
                    messageError("SDL_PutAudioStreamData failed (%s)", SDL_GetError());
                    running = false;
                    ret     = 1;
                    return;
                }

                // And pad with silence if necessary
                int numBytesInSilence = intervalNumBytes - ticTacNumBytes;
                while (numBytesInSilence > int(silenceSamples.size())) {
                    if (!SDL_PutAudioStreamData(stream, silenceSamples.data(), int(silenceSamples.size())))
                        [[unlikely]] {
                        messageError("SDL_PutAudioStreamData failed (%s)", SDL_GetError());
                        running = false;
                        ret     = 1;
                        return;
                    }
                    numBytesInSilence -= int(silenceSamples.size());
                }
                if (numBytesInSilence > 0) {
                    if (!SDL_PutAudioStreamData(stream, silenceSamples.data(), numBytesInSilence)) [[unlikely]] {
                        messageError("SDL_PutAudioStreamData failed (%s)", SDL_GetError());
                        running = false;
                        ret     = 1;
                        return;
                    }
                }
            }

            waitTp += interval;
        }
    }};

    const auto startPlaying = [&] {
        {
            std::lock_guard lock(waitMutex);
            waitTp = clock_t::now() + startDelay;
            wait   = false;
        }
        waitCv.notify_one();
    };

    // Just to make sure periodicSoundThread is ready
    std::this_thread::sleep_for(1s);

    startPlaying();

    while (true) {
        for (SDL_Event event; running && SDL_PollEvent(&event);) {
            switch (event.type) {
            case SDL_EVENT_QUIT: {
                running = false;
                break;
            }
            case SDL_EVENT_MOUSE_BUTTON_DOWN: {
                currentTimeIntervalIdx = (currentTimeIntervalIdx + 1) % timeIntervals.size();
                if (!SDL_ClearAudioStream(stream)) {
                    messageError("SDL_ClearAudioStream failed (%s)", SDL_GetError());
                    running = false;
                    ret     = 1;
                    continue;
                }
                startPlaying();
                break;
            }
            default: break;
            }
        }
        if (!running) {
            {
                std::lock_guard lock(waitMutex);
                wait = false;
            }
            waitCv.notify_one();

            break;
        }

        using duration_t = std::chrono::nanoseconds;

        constexpr duration_t blinkDecayTime = 100'000'000ns; // 100 ms

        const auto elapsedTimeSinceLastBeat = std::chrono::duration_cast<duration_t>(clock_t::now() - lastTicTacTime);

        double colorItensity = 0.0;
        if (elapsedTimeSinceLastBeat > 0ns && elapsedTimeSinceLastBeat < blinkDecayTime) {
            colorItensity = double(blinkDecayTime.count() - elapsedTimeSinceLastBeat.count()) / blinkDecayTime.count();
        }

        if (!SDL_SetRenderDrawColor(renderer,
                                    currentPlayedSound == Sound::TIC ? uint8_t(0XFF * colorItensity) : 0X00,
                                    uint8_t(0X7F * colorItensity),
                                    currentPlayedSound == Sound::TAC ? uint8_t(0XFF * colorItensity) : 0X00,
                                    0XFF)) [[unlikely]] {
            messageError("SDL_SetRenderDrawColor failed (%s)", SDL_GetError());
            running = false;
            ret     = 1;
            continue;
        }

        if (!SDL_RenderClear(renderer)) [[unlikely]] {
            messageError("SDL_RenderClear failed (%s)", SDL_GetError());
            running = false;
            ret     = 1;
            continue;
        }

        if (!SDL_RenderPresent(renderer)) [[unlikely]] {
            messageError("SDL_RenderPresent failed (%s)", SDL_GetError());
            running = false;
            ret     = 1;
            continue;
        }
    }

    // End
    periodicSoundThread.join();

    SDL_DestroyAudioStream(stream);
    SDL_DestroyRenderer(renderer);
    SDL_DestroyWindow(window);

    SDL_Quit();

    return ret;
}

Observe that when we get near a 500 ms delay, the tic and tac swap colors on the second interval mode (tic misses the orarnge blink and plays on blue, and vice versa).

Sometimes the delay is not as huge as 500 ms, but you can still verify it by closing you eyes and opening when you hear the tic/tac sound (yes, my application requires that precision in synchronization between audio and video). On Linux, where the delay is imperceptible, you can easly still see the color on the screen, while on Android/Windows most of the time you'll face the black screen.

@slouken slouken added this to the 3.4.0 milestone Jan 18, 2025
@slouken
Copy link
Collaborator

slouken commented Jan 18, 2025

Interesting, I tried this on Windows and the sound and flash were almost perfectly synchronized.

@icculus
Copy link
Collaborator

icculus commented Jan 18, 2025

Same:

trim.63AB7BA4-FC00-43CB-BBBC-6181EF5F0850.MOV

(But this is Windows in VirtualBox on Linux, fwiw.)

@icculus
Copy link
Collaborator

icculus commented Jan 18, 2025

Calling SDL_FlushAudioStream after SDL_PutAudioStreamData didn't resolve the issue,

To be clear: this doesn't move data to the hardware, it just tells SDL not to hold data back for correct resampling, because there won't be more data coming to resample against. In terms of an ongoing stream of audio, you never call SDL_FlushAudioStream, you just keep adding more data as appropriate.

@NicolasFirmo
Copy link
Contributor Author

So the issue for my main application on Windows was due to other parts of the code (that's why the delay was there but not so significant), now that I tested the minimal example on Windows I get expected results.

But the delay problem still remains on my Android phone, either on my fixed main application as well as in this minimal example I gave.

WIN_20250117_23_25_01_Pro.mp4

@NicolasFirmo NicolasFirmo changed the title Significant audio delay between call to SDL_PutAudioStreamData and the actual output (Windows and Android) Significant audio delay between call to SDL_PutAudioStreamData and the actual output (Android) Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants