SDL3 GPU WebGPU Backend #12046

klukaszek · 2025-01-21T23:08:49Z

Description

Congrats on shipping SDL 3.20, and officially releasing SDL3!

Now that SDL3 has been released, I have decided to open a PR for my work for the WebGPU backend as suggested by @flibitijibibo.

Attached is a checklist of the API methods, as well as a checklist of working examples. (As of 2025-01-21).

Examples and more info can be found at: https://github.com/klukaszek/SDL3-WebGPU-Examples
(Based on https://github.com/TheSpydog/SDL_gpu_examples/)

A live demo can be found at: https://kylelukaszek.com/SDL3-WebGPU-Examples/.

My fork currently fails to pass the Emscripten pipeline test for some reason that I haven't taken the time to investigate yet. So that will probably have to be resolved before merging with main.

I'm probably gonna get to work on compute pipelines sometime soon if no one ends up working on that by the time I'm free again.

Shaders

This current implementation of the backend expects WGSL shaders since I have only tested on browsers, and browser implementations of WebGPU don't offer support for the SPIRV SType. Once native WGPU support becomes a priority, then this issue can be tackled.

API Checklist

General

DestroyDevice
SupportsPresentMode
ClaimWindow
ReleaseWindow

Swapchains

SetSwapchainParameters
SupportsTextureFormat
SupportsSampleCount
SupportsSwapchainComposition

Command Buffers and Fences

Note: WebGPU has no exposed fence API.

Buffers

Textures

CreateTexture
ReleaseTexture
SetTextureName
UploadToTexture
DownloadFromTexture (needs to be tested)
CopyTextureToTexture (needs to be tested)
GenerateMipmaps
- Requires custom compute pipeline implementation.
- https://eliemichel.github.io/LearnWebGPU/basic-compute/image-processing/mipmap-generation.html

Samplers

CreateSampler
ReleaseSampler

Debugging

InsertDebugLabel
PushDebugGroup
PopDebugGroup

Graphics Pipelines

CreateGraphicsPipeline
BindGraphicsPipeline
ReleaseGraphicsPipeline

Compute Pipelines

CreateComputePipeline
BindComputePipeline
ReleaseComputePipeline

Shaders

CreateShader
ReleaseShader

Rendering

Copy Passes

BeginCopyPass
EndCopyPass

Compute Passes

Fragment Stage

BindFragmentSamplers
BindFragmentStorageTextures
BindFragmentStorageBuffers
PushFragmentUniformData
- Needs to be rewritten.

Vertex Stage

Rendering States

SetViewport
SetScissor
SetBlendConstants
SetStencilReference

Composition

Blit
- Mostly functional.
- Bug: Example "Blit2DArray.c" has a sampler issue where the RHS is not downsampled.
- Bug: Example "TriangleMSAA.c" does not cycle between different sample counts.

Example Checklist

Native WebGPU Support

I have not done any testing with native distributions of WebGPU (WGPU Native / Dawn), though I have implemented Elie Michel's surface selector logic sdl3webgpu.c for when someone wants to give it a test.

Warning:
The preprocessor macros in WebGPU_INTERNAL_CreateSurface() don't seem to work properly, and as a result, I hard coded in a workaround since I'm only testing on the web for the time being.

Existing Issue(s)

#10768

…gh all of the commits were the one I just rebased... Fixed everything back up.

…PU objects aren't being released via the bindings. Might be an actual bug with Emscripten's bindings specifically, need more info. Working on a solution for uniform functions in SDL3. WebGPU BindGroups make this specific approach tough to handle. Assume uniform struct is stored at group 0 binding 0, contents should be 1 buffer FOR NOW.

Improved logging for shader creation

…ere is no reason for them to mimic the Vulkan implementation. Added GPU API checklist. Next will be vertex and fragment uniform buffers. Updated checklist

… crashes, but nothing renders properly. Need to investigate further.

…ad of individual enums.

…a bunch of existing bugs with the backend. Still encountering a layerCount issue that I cannot verify. My debugger says the texture and texture view both have 4 layers, but the error says that the texture's array layer count is 1.

…hecklist.

…allows views of 1 layer for color attachments...

…ctionality offered in WebGPU.

… pipelines. Now we create internal SDL pipelines and everything is handled nicely. 3D texture example still works.

…gate why the sampler isn't working in the Blit2DArray example.

…no longer needed outside of the frame. Minimizes heap resizing

… more static allocations now. Static allocations only occur on named object creation, and when dealing with PipelineLayouts. Planning on refactoring PipelineLayouts later.

… the emscripten keyboard event handlers when no hint was set.

… configure the surface. Elie Michel's surface configuration logic was added but the macros don't seem to want to work for me. I've added a temporary workaround since I am only testing Emscripten anyways.

slouken · 2025-01-21T23:11:48Z

Congrats on the awesome progress!

kg · 2025-01-21T23:31:18Z

src/gpu/webgpu/SDL_gpu_webgpu.c

+        while (SDL_GetAtomicInt(&buffer->mappingComplete) != 1) {
+            if (SDL_GetTicks() - startTime > TIMEOUT) {
+                SDL_LogError(SDL_LOG_CATEGORY_GPU, "Failed to map buffer: timeout");
+                return NULL;
+            }
+
+            SDL_Delay(1);
+        }


This spin-wait is a huge red flag. Generally speaking browser async operations should not be implemented this way. I would be very concerned that this will break on certain targets since generally async stuff on the web is specified to not be observable until the event loop turns; if this happens to work it could break in the future and nobody would know what was going on.

At a minimum you should have a comment here that specifies why it's safe/appropriate to do this instead of doing something else (I don't know what else you'd do offhand) - i.e. 'here's the part of the WebGPU spec that says this is legal and the spin should complete quickly' or 'i tested this on and on and '.

Thankfully this appears to only apply to readback which makes it have less of an impact on the overall API; it might be that what you need to do is specify an async readback API extension to SDL_GPU and make that the only legal way to do readback on the WebGPU target.

Blocking the browser's main thread (for up to 1000ms in this case) is very bad. It causes all sorts of downstream problems.

I'll throw some comments in! I'll also have to add some preprocessor macros to ensure that SDL_Delay(1) calls are specific to Emscripten. This is done since browser backends for WebGPU don't give access to device ticking, so we have to yield back to the browser for a tiny amount of time for the backend to tick the device for us.

Here's a quote from Elie Michel:

"When our C++ code runs in a Web browser (after being compiled to WebAssembly through emscripten), there is no explicit way to tick/poll the WebGPU device. This is because the device is managed by the Web browser itself, which decides at what pace polling should happen. As a result:

The device never ticks in between two consecutive lines of our WebAssembly module, it can only tick when the execution flow leaves the module.

The device always ticks between two calls to our MainLoop() function, because if you remember the Emscripten section of the Opening a Window chapter, we leave the main loop management to the browser and only provide a callback to run at each frame.

Thanks to the second point, we do not need wgpuPollEvents to do anything when called at the beginning or end of our main loop (so we set yieldToWebBrowser to false).

However, if what we intend is really to wait until something happens (e.g., a callback gets invoked), the first point requires us to make sure we yield back the execution flow to the Web browser, so that it may tick its device from time to time. We do this thanks to emscripten_sleep function, at the cost of effectively sleeping during 100 ms (we’re in a case where we want to wait anyways).

Note that using emscripten_sleep requires the -SASYNCIFY link option to be passed to emscripten, like we added already."

specify an async readback API extension to SDL_GPU

We have an async readback API, it's the Download and QueryFence/WaitForFence functions. If the committee can't define their specification for this extremely common use case in a normal way like every single industry-standard API going back to D3D11 that is firmly their problem. I would rather force the webGPU backend to implement a hack to make it work our way than poison our API with something as stupid as an async buffer map call.

Okay, so because you're relying on asyncify being set (I missed this, sorry! my bad) the sleep is not a spinwait but is instead a yield-to-browser-event-loop. That's much better.

Just so I'm not only being grumpy in this thread, here's a quick sketch of how this could possibly work:

A "fence" in the webGPU backend could just be defined as a group of resources that are waiting on async map operations. Then implementing QueryFence would be as simple as checking buffer->mappingComplete for each of these resources. WaitForFence could be implemented with the spinwait. That might be enough for this to work.

kg · 2025-01-21T23:32:58Z

src/gpu/webgpu/SDL_gpu_webgpu.c

+    while (!renderer->device) {
+        SDL_Delay(1);
+    }


Is there a forward progress guarantee here? Please specify what provides the guarantee of forward progress. A naive reading of this suggests that it might never stop spinning since there's no timeout. It would be nice to at least see a timeout here and have it error out when the timeout expires.

It would be even better to not have this spin-wait. It's a red flag and doesn't seem like it should be necessary if everything is working correctly, it suggests that someone - not necessarily you, it could be the browser vendor or the user mode graphics driver - got something wrong.

Worst-case this spin wait could actually prevent forward progress if something important is waiting in the event loop queue.

Instead of checking the device pointer itself, I can add some bool that gets toggled by the RequestDeviceCallback.

If the status received by the callback is anything but successful, then we say that it failed which would then terminate the quoted infinite loop.

See: 11d8ef7

thatcosmonaut · 2025-01-21T23:40:59Z

Looks like there was a bad rebase because some of the enum entries gpu.c have been randomly deleted, etc. The includes need to be cleaned up too.

klukaszek · 2025-01-22T00:13:55Z

Looks like there was a bad rebase because some of the enum entries gpu.c have been randomly deleted, etc. The includes need to be cleaned up too.

I reckon it was in here: 850caed

…e callback.

thatcosmonaut

I've left comments on all the obvious stuff I noticed for now.

I'll also note here that cycling hasn't been implemented for any resources.

thatcosmonaut · 2025-01-22T00:36:42Z

src/gpu/SDL_gpu.c

+#ifdef __EMSCRIPTEN__
+    SDL_SetHint(SDL_HINT_GPU_DRIVER, "webgpu");
+#endif


This isn't right, we shouldn't be depending on emscripten since webgpu can also have native implementations.

thatcosmonaut · 2025-01-22T00:38:05Z

src/gpu/SDL_gpu.c

+            bool is_webgpu = SDL_strcasecmp(backend, "webgpu") == 0;
+
+            // WebGPU uses ~0u for default layer_or_depth_plane, however this causes issues with other backends
+            if (color_target_infos[i].layer_or_depth_plane == ~0u && !is_webgpu) {


We should be translating from SDL to WGPU, not the other way around. If the client passes in ~0u for the layer then that violates our spec.

Link: c1d8428

thatcosmonaut · 2025-01-22T00:38:34Z

src/gpu/SDL_gpu.c

+            // Get hint to check for "webgpu"
+            const char *backend = SDL_GetHint(SDL_HINT_GPU_DRIVER);
+            bool is_webgpu = SDL_strcasecmp(backend, "webgpu") == 0;


We don't have to query hints to get the backend from gpu.c

thatcosmonaut · 2025-01-22T00:39:13Z

src/gpu/SDL_sysgpu.h

@@ -18,6 +18,7 @@
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.
 */
+#include "../SDL_internal.h"


Incorrect #include

thatcosmonaut · 2025-01-22T00:39:25Z

src/gpu/SDL_gpu.c

@@ -20,6 +20,7 @@
 */
 #include "SDL_internal.h"
 #include "SDL_sysgpu.h"
+#include <SDL3/SDL_gpu.h>


Incorrect #include

thatcosmonaut · 2025-01-22T00:53:31Z

src/gpu/webgpu/SDL_gpu_webgpu.c

+        .label = "SDL_GPU Command Encoder",
+    };
+
+    commandBuffer->commandEncoder = wgpuDeviceCreateCommandEncoder(renderer->device, &commandEncoderDesc);


Better to pool the command buffer structures than creating a new command encoder every frame.

thatcosmonaut · 2025-01-22T00:53:47Z

src/gpu/webgpu/SDL_gpu_webgpu.c

+    int width, height;
+    SDL_GetWindowSize(renderer->claimedWindows[0]->window, &width, &height);
+    commandBuffer->currentViewport = (WebGPUViewport){ 0, 0, width, height, 0.0, 1.0 };
+    commandBuffer->currentScissor = (WebGPURect){ 0, 0, width, height };


Why is this function touching windows? This should be done in BeginRenderPass.

Moved to BeginRenderPass. I'll link commit once it's up.

Link: 8d601ec

thatcosmonaut · 2025-01-22T00:55:06Z

src/gpu/webgpu/SDL_gpu_webgpu.c

+{
+    // Just call Submit for WebGPU
+    WebGPU_Submit(commandBuffer);
+    // There are no fences in WebGPU, so we don't need to do anything here


Not having any kind of fence abstraction is going to break tons of applications.

It seems like there's some kind of pseudo-fence callback structure:
https://developer.mozilla.org/en-US/docs/Web/API/GPUQueue/onSubmittedWorkDone

Just adding stuff here as notes for myself when I return:

In the C API, the function is defined as: wgpuQueueOnSubmittedWorkDone(WGPUQueue queue, WGPUQueueWorkDoneCallback callback, void *userdata).

Alright, then this can probably be implemented by just having a Fence struct as the userdata and then marking it as finished in the callback.

thatcosmonaut · 2025-01-22T00:56:22Z

src/gpu/webgpu/SDL_gpu_webgpu.c

+// Slightly altered, though with permission by Elie Michel:
+// @ https://github.com/eliemichel/sdl3webgpu/blob/main/sdl3webgpu.c
+// https://github.com/libsdl-org/SDL/issues/10768#issuecomment-2499532299
+#if defined(SDL_PLATFORM_MACOS)


We shouldn't be touching platform code in the implementation like this. We'll probably need some kind of platform abstraction in SDL itself that can get a WGPU surface.

thatcosmonaut · 2025-01-22T00:59:19Z

src/gpu/webgpu/SDL_gpu_webgpu.c

+
+    bool cycleBindGroups;
+
+    WebGPUUniformBuffer vertexUniformBuffers[MAX_UNIFORM_BUFFERS_PER_STAGE];


The pipeline should not own these, uniform buffers should be pooled.

…erPass

slouken · 2025-01-22T01:32:31Z

src/gpu/webgpu/SDL_gpu_webgpu.c

@@ -0,0 +1,4602 @@
+// File: /webgpu/SDL_gpu_webgpu.c


Please include the standard text from https://github.com/libsdl-org/SDL/blob/main/include/SDL3/SDL_copying.h and add any copyright attribution you'd like here.

Link: a385d47

thatcosmonaut · 2025-01-22T01:41:03Z

src/gpu/webgpu/SDL_gpu_webgpu.c

@@ -2090,6 +2106,11 @@ void WebGPU_BeginRenderPass(SDL_GPUCommandBuffer *commandBuffer,
        return;
    }

+    int width, height;
+    SDL_GetWindowSize(wgpu_cmd_buf->renderer->claimedWindows[0]->window, &width, &height);


This is still not right, the viewport and scissor should be set to the smallest size of bound render targets. Please reference how the other backends implemented this.

Ah I get it now! I'll return to this after some rest I think.

I read up on the Vulkan implementation and will follow that one tomorrow.

Link: e24094d

It's still not 1-to-1 with the Vulkan backend but the viewport and scissor now use the smallest available size of all bound render targets.

It also now sets the other default states for the render pass.

slouken · 2025-01-22T01:46:15Z

src/gpu/webgpu/SDL_gpu_webgpu.c

+// Note: Compiling SDL GPU programs using emscripten will require -sUSE_WEBGPU=1 -sASYNCIFY=1
+
+#include "../SDL_sysgpu.h"
+#include "SDL_internal.h"


SDL_internal.h needs to be the first include in the file. I usually throw it right after the standard blurb at the top so I don't forget.

Link: d2fbc02

…atch Vulkan implementation.

…ed this problem already.

…n tested but it compiles.

klukaszek added 29 commits January 21, 2025 17:39

Nuked my repo due to it being 600 commits ahead of upstream even thou…

850caed

…gh all of the commits were the one I just rebased... Fixed everything back up.

Got Point, Linear, and Aniso samplers working with Clamp and Repeat

35afb6e

Cleaned up logging in preparation for managing Uniform Buffers.

b5b0b4e

Improved logging for shader creation

Refactored buffer and texture data structures to be more simple as th…

37d08d1

…ere is no reason for them to mimic the Vulkan implementation. Added GPU API checklist. Next will be vertex and fragment uniform buffers. Updated checklist

PushVertexUniformData() and PushFragmentUniformData() no longer cause…

34f0a47

… crashes, but nothing renders properly. Need to investigate further.

Fixed SDLToWGPUTextureUsageFlags() to properly handle bit flags inste…

c380e4d

…ad of individual enums.

Implemented simple miscellaneous functions to knock them off of the c…

7605ea7

…hecklist.

Trying to get array layers working for 3D textures since WebGPU only …

9496c2e

…allows views of 1 layer for color attachments...

Implemented a blitting pipeline for WebGPU since there is no blit fun…

478b4cd

…ctionality offered in WebGPU.

Refactored to use SDL_GPU_BlitCommon instead of writing my own WebGPU…

ee68d05

… pipelines. Now we create internal SDL pipelines and everything is handled nicely. 3D texture example still works.

Made more progess on 2D texture array blitting. Now I need to investi…

abae635

…gate why the sampler isn't working in the Blit2DArray example.

Refactored several calls to SDL_malloc to SDL_stack_alloc if data is …

9b8e259

…no longer needed outside of the frame. Minimizes heap resizing

Feat: Added stencil reference adjustment

cecebf8

Feat: Got BlitCube example working

fd085f6

Refactor: Eliminated many unnecessary dynamic memory allocations. Far…

4fe10d0

… more static allocations now. Static allocations only occur on named object creation, and when dealing with PipelineLayouts. Planning on refactoring PipelineLayouts later.

Removed print statements that angered different test pipelines.

8a42bb4

Misc: Moved SDL3 WebGPU Checklist to src/gpu/webgpu/

51a5c38

Misc: Moved SDL3 WebGPU Checklist to src/gpu/webgpu/

cb91daf

Misc: Added SDL3 GPU ReadMe

140802a

Misc: Added SDL3 GPU ReadMe

bcdc2d5

Misc: Updated emscripten's version for the test pipeline.

9e6dc5a

Misc: Updated emscripten's version for the test pipeline.

5565a59

Misc: trying to get Emscripten test pipeline working.

f967732

Fix: Rebased with upstream and solved a recently implemented bug with…

467c870

… the emscripten keyboard event handlers when no hint was set.

Feat: Made swapchain a static allocation.

c21747f

Refactor: Removed Emscripten specific Swapchain logic. Now I manually…

58e6140

… configure the surface. Elie Michel's surface configuration logic was added but the macros don't seem to want to work for me. I've added a temporary workaround since I am only testing Emscripten anyways.

Feat: Implemented indirect drawing support for WebGPU.

d570ca5

slouken added this to the 3.4.0 milestone Jan 21, 2025

kg reviewed Jan 21, 2025

View reviewed changes

Refactor: Added explicit loop termination while waiting for the devic…

11d8ef7

…e callback.

thatcosmonaut requested changes Jan 22, 2025

View reviewed changes

Refactor: Setting of viewport and scissor is done in WebGPU_BeginRend…

8d601ec

…erPass

slouken reviewed Jan 22, 2025

View reviewed changes

Misc: Added standard text to top of file.

a385d47

thatcosmonaut reviewed Jan 22, 2025

View reviewed changes

slouken reviewed Jan 22, 2025

View reviewed changes

klukaszek added 9 commits January 21, 2025 21:00

Misc: Changed some imports.

c226ca5

Misc: Changed some more imports.

2b22917

Refactor: Changed default renderpass state initialization to better m…

e24094d

…atch Vulkan implementation.

Refactor: Removed ~0u conversion as I had forgotten that I had resolv…

c1d8428

…ed this problem already.

Refactor: Moved SDL_internal.h to the very top

d2fbc02

Refactor: Started implementing fencing properly.

09523e4

Refactor: Added fence pool and associated internal functions.

8e72939

Refactor: Added uniform buffer pool.

bc4a11c

Refactor: Implemented placeholder fencing logic. Has not actually bee…

0533e73

…n tested but it compiles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDL3 GPU WebGPU Backend #12046

SDL3 GPU WebGPU Backend #12046

klukaszek commented Jan 21, 2025

slouken commented Jan 21, 2025

kg Jan 21, 2025

klukaszek Jan 21, 2025

thatcosmonaut Jan 21, 2025 •

edited

Loading

kg Jan 22, 2025

thatcosmonaut Jan 22, 2025

kg Jan 21, 2025

klukaszek Jan 21, 2025

klukaszek Jan 22, 2025

thatcosmonaut commented Jan 21, 2025

klukaszek commented Jan 22, 2025

thatcosmonaut left a comment

thatcosmonaut Jan 22, 2025

thatcosmonaut Jan 22, 2025

klukaszek Jan 22, 2025

thatcosmonaut Jan 22, 2025

thatcosmonaut Jan 22, 2025

thatcosmonaut Jan 22, 2025

thatcosmonaut Jan 22, 2025

thatcosmonaut Jan 22, 2025

klukaszek Jan 22, 2025

klukaszek Jan 22, 2025

thatcosmonaut Jan 22, 2025

klukaszek Jan 22, 2025

thatcosmonaut Jan 22, 2025

thatcosmonaut Jan 22, 2025

thatcosmonaut Jan 22, 2025

slouken Jan 22, 2025

klukaszek Jan 22, 2025

thatcosmonaut Jan 22, 2025

klukaszek Jan 22, 2025

klukaszek Jan 22, 2025 •

edited

Loading

slouken Jan 22, 2025

klukaszek Jan 22, 2025


		bool cycleBindGroups;

		WebGPUUniformBuffer vertexUniformBuffers[MAX_UNIFORM_BUFFERS_PER_STAGE];

SDL3 GPU WebGPU Backend #12046

Are you sure you want to change the base?

SDL3 GPU WebGPU Backend #12046

Conversation

klukaszek commented Jan 21, 2025

Description

Shaders

API Checklist

General

Swapchains

Command Buffers and Fences

Buffers

Textures

Samplers

Debugging

Graphics Pipelines

Compute Pipelines

Shaders

Rendering

Copy Passes

Compute Passes

Fragment Stage

Vertex Stage

Rendering States

Composition

Example Checklist

Native WebGPU Support

Existing Issue(s)

slouken commented Jan 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thatcosmonaut Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thatcosmonaut commented Jan 21, 2025

klukaszek commented Jan 22, 2025

thatcosmonaut left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klukaszek Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thatcosmonaut Jan 21, 2025 •

edited

Loading

klukaszek Jan 22, 2025 •

edited

Loading