cpp-gamedev · karnkaul · Mar 23, 2025 · Mar 23, 2025 · Mar 23, 2025 · Mar 23, 2025
diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
@@ -0,0 +1,35 @@
+name: Deploy
+on:
+  push:
+    branches:
+      - main
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write  # To push a branch 
+      pages: write  # To push to a GitHub Pages site
+      id-token: write # To update the deployment status
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - name: init
+        run: |
+          url="https://github.com/rust-lang/mdBook/releases/download/v0.4.47/mdbook-v0.4.47-x86_64-unknown-linux-gnu.tar.gz"
+          mkdir mdbook
+          curl -sSL $url | tar -xz --directory=./mdbook
+          echo `pwd`/mdbook >> $GITHUB_PATH
+      - name: build book
+        run: |
+          cd guide
+          mdbook build
+      - name: setup pages
+        uses: actions/configure-pages@v4
+      - name: upload artifact
+        uses: actions/upload-pages-artifact@v3
+        with:
+          path: 'book'
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v4
diff --git a/guide/src/SUMMARY.md b/guide/src/SUMMARY.md
@@ -16,3 +16,8 @@
   - [Vulkan Device](initialization/device.md)
   - [Scoped Waiter](initialization/scoped_waiter.md)
   - [Swapchain](initialization/swapchain.md)
+- [Rendering](rendering/README.md)
+  - [Swapchain Loop](rendering/swapchain_loop.md)
+  - [Render Sync](rendering/render_sync.md)
+  - [Swapchain Update](rendering/swapchain_update.md)
+  - [Dynamic Rendering](rendering/dynamic_rendering.md)
diff --git a/guide/src/rendering/README.md b/guide/src/rendering/README.md
@@ -0,0 +1,3 @@
+# Rendering
+
+This section implements Render Sync, the Swapchain loop, performs Swapchain image layout transitions, and introduces Dynamic Rendering.
diff --git a/guide/src/rendering/dynamic_rendering.md b/guide/src/rendering/dynamic_rendering.md
@@ -0,0 +1,172 @@
+# Dynamic Rendering
+
+Dynamic Rendering enables us to avoid using Render Passes, which are quite a bit more verbose (but also generally more performant on tiled GPUs). Here we tie together the Swapchain, Render Sync, and rendering.
+
+In the main loop, attempt to acquire a Swapchain image / Render Target:
+
+```cpp
+auto const framebuffer_size = glfw::framebuffer_size(m_window.get());
+// minimized? skip loop.
+if (framebuffer_size.x <= 0 || framebuffer_size.y <= 0) { continue; }
+// an eErrorOutOfDateKHR result is not guaranteed if the
+// framebuffer size does not match the Swapchain image size, check it
+// explicitly.
+auto fb_size_changed = framebuffer_size != m_swapchain->get_size();
+auto& render_sync = m_render_sync.at(m_frame_index);
+auto render_target = m_swapchain->acquire_next_image(*render_sync.draw);
+if (fb_size_changed || !render_target) {
+  m_swapchain->recreate(framebuffer_size);
+  continue;
+}
+```
+
+Wait for the associated fence and reset ('un'signal) it:
+
+```cpp
+static constexpr auto fence_timeout_v =
+  static_cast<std::uint64_t>(std::chrono::nanoseconds{3s}.count());
+auto result = m_device->waitForFences(*render_sync.drawn, vk::True,
+                    fence_timeout_v);
+if (result != vk::Result::eSuccess) {
+  throw std::runtime_error{"Failed to wait for Render Fence"};
+}
+// reset fence _after_ acquisition of image: if it fails, the
+// fence remains signaled.
+m_device->resetFences(*render_sync.drawn);
+```
+
+Since the fence has been reset, a queue submission must be made that signals it before continuing, otherwise the app will deadlock on the next wait (and eventually throw after 3s). We can now begin command buffer recording:
+
+```cpp
+auto command_buffer_bi = vk::CommandBufferBeginInfo{};
+// this flag means recorded commands will not be reused.
+command_buffer_bi.setFlags(
+  vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
+render_sync.command_buffer.begin(command_buffer_bi);
+```
+
+We are not ready to actually render anything yet, but can clear the image to a particular color. First we need to transition the image for rendering, ie Attachment Optimal layout. Set up the image barrier and record it:
+
+```cpp
+auto dependency_info = vk::DependencyInfo{};
+auto barrier = m_swapchain->base_barrier();
+// Undefined => AttachmentOptimal
+// we don't need to block any operations before the barrier, since we
+// rely on the image acquired semaphore to block rendering.
+// any color attachment operations must happen after the barrier.
+barrier.setOldLayout(vk::ImageLayout::eUndefined)
+  .setNewLayout(vk::ImageLayout::eAttachmentOptimal)
+  .setSrcAccessMask(vk::AccessFlagBits2::eNone)
+  .setSrcStageMask(vk::PipelineStageFlagBits2::eTopOfPipe)
+  .setDstAccessMask(vk::AccessFlagBits2::eColorAttachmentWrite)
+  .setDstStageMask(
+    vk::PipelineStageFlagBits2::eColorAttachmentOutput);
+dependency_info.setImageMemoryBarriers(barrier);
+render_sync.command_buffer.pipelineBarrier2(dependency_info);
+```
+
+Create an Rendering Attachment Info using the acquired image as the color target. We use a red clear color, make sure the Load Op clears the image, and Store Op stores the results (currently just the cleared image):
+
+```cpp
+auto attachment_info = vk::RenderingAttachmentInfo{};
+attachment_info.setImageView(render_target->image_view)
+  .setImageLayout(vk::ImageLayout::eAttachmentOptimal)
+  .setLoadOp(vk::AttachmentLoadOp::eClear)
+  .setStoreOp(vk::AttachmentStoreOp::eStore)
+  .setClearValue(vk::ClearColorValue{1.0f, 0.0f, 0.0f, 1.0f});
+```
+
+Set up a Rendering Info object with the color attachment and the entire image as the render area:
+
+```cpp
+auto rendering_info = vk::RenderingInfo{};
+auto const render_area =
+  vk::Rect2D{vk::Offset2D{}, render_target->extent};
+rendering_info.setRenderArea(render_area)
+  .setColorAttachments(attachment_info)
+  .setLayerCount(1);
+```
+
+Finally, execute a render:
+
+```cpp
+render_sync.command_buffer.beginRendering(rendering_info);
+// draw stuff here.
+render_sync.command_buffer.endRendering();
+```
+
+Transition the image for presentation:
+
+```cpp
+// AttachmentOptimal => PresentSrc
+// the barrier must wait for color attachment operations to complete.
+// we don't need any post-synchronization as the present Sempahore takes
+// care of that.
+barrier.setOldLayout(vk::ImageLayout::eAttachmentOptimal)
+  .setNewLayout(vk::ImageLayout::ePresentSrcKHR)
+  .setSrcAccessMask(vk::AccessFlagBits2::eColorAttachmentWrite)
+  .setSrcStageMask(vk::PipelineStageFlagBits2::eColorAttachmentOutput)
+  .setDstAccessMask(vk::AccessFlagBits2::eNone)
+  .setDstStageMask(vk::PipelineStageFlagBits2::eBottomOfPipe);
+dependency_info.setImageMemoryBarriers(barrier);
+render_sync.command_buffer.pipelineBarrier2(dependency_info);
+```
+
+End the command buffer and submit it:
+
+```cpp
+render_sync.command_buffer.end();
+
+auto submit_info = vk::SubmitInfo2{};
+auto const command_buffer_info =
+  vk::CommandBufferSubmitInfo{render_sync.command_buffer};
+auto wait_semaphore_info = vk::SemaphoreSubmitInfo{};
+wait_semaphore_info.setSemaphore(*render_sync.draw)
+  .setStageMask(vk::PipelineStageFlagBits2::eTopOfPipe);
+auto signal_semaphore_info = vk::SemaphoreSubmitInfo{};
+signal_semaphore_info.setSemaphore(*render_sync.present)
+  .setStageMask(vk::PipelineStageFlagBits2::eColorAttachmentOutput);
+submit_info.setCommandBufferInfos(command_buffer_info)
+  .setWaitSemaphoreInfos(wait_semaphore_info)
+  .setSignalSemaphoreInfos(signal_semaphore_info);
+m_queue.submit2(submit_info, *render_sync.drawn);
+```
+
+The `draw` Semaphore will be signaled by the Swapchain when the image is ready, which will trigger this command buffer's execution. It will signal the `present` Semaphore and `drawn` Fence on completion, with the latter being waited on the next time this virtual frame is processed. Finally, we increment the frame index, pass the `present` semaphore as the one for the subsequent present operation to wait on:
+
+```cpp
+m_frame_index = (m_frame_index + 1) % m_render_sync.size();
+
+if (!m_swapchain->present(m_queue, *render_sync.present)) {
+  m_swapchain->recreate(framebuffer_size);
+  continue;
+}
+```
+
+> Wayland users: congratulaions, you can finally see and interact with the window!
+
+![Cleared Image](./dynamic_rendering_red_clear.png)
+
+## Render Doc on Wayland
+
+At the time of writing, RenderDoc doesn't support inspecting Wayland applications. Temporarily force X11 (XWayland) by calling `glfwInitHint()` before `glfwInit()`:
+
+```cpp
+glfwInitHint(GLFW_PLATFORM, GLFW_PLATFORM_X11);
+```
+
+Setting up a command line option to conditionally call this is a simple and flexible approach: just set that argument in RenderDoc itself and/or pass it whenever an X11 backend is desired:
+
+```cpp
+// main.cpp
+// skip the first argument.
+auto args = std::span{argv, static_cast<std::size_t>(argc)}.subspan(1);
+while (!args.empty()) {
+  auto const arg = std::string_view{args.front()};
+  if (arg == "-x" || arg == "--force-x11") {
+    glfwInitHint(GLFW_PLATFORM, GLFW_PLATFORM_X11);
+  }
+  args = args.subspan(1);
+}
+lvk::App{}.run();
+```
diff --git a/guide/src/rendering/dynamic_rendering_red_clear.png b/guide/src/rendering/dynamic_rendering_red_clear.png
diff --git a/guide/src/rendering/render_sync.md b/guide/src/rendering/render_sync.md
@@ -0,0 +1,75 @@
+# Render Sync
+
+Create a new header `resource_buffering.hpp`:
+
+```cpp
+// Number of virtual frames.
+inline constexpr std::size_t buffering_v{2};
+
+// Alias for N-buffered resources.
+template <typename Type>
+using Buffered = std::array<Type, buffering_v>;
+```
+
+Add a private `struct RenderSync` to `App`:
+
+```cpp
+	struct RenderSync {
+		// signaled when Swapchain image has been acquired.
+		vk::UniqueSemaphore draw{};
+		// signaled when image is ready to be presented.
+		vk::UniqueSemaphore present{};
+		// signaled with present Semaphore, waited on before next render.
+		vk::UniqueFence drawn{};
+		// used to record rendering commands.
+		vk::CommandBuffer command_buffer{};
+	};
+```
+
+Add the new members associated with the Swapchain loop:
+
+```cpp
+	// command pool for all render Command Buffers.
+	vk::UniqueCommandPool m_render_cmd_pool{};
+	// Sync and Command Buffer for virtual frames.
+	Buffered<RenderSync> m_render_sync{};
+	// Current virtual frame index.
+	std::size_t m_frame_index{};
+```
+
+Add, implement, and call the create function:
+
+```cpp
+void App::create_render_sync() {
+	// Command Buffers are 'allocated' from a Command Pool (which is 'created'
+	// like all other Vulkan objects so far). We can allocate all the buffers
+	// from a single pool here.
+	auto command_pool_ci = vk::CommandPoolCreateInfo{};
+	// this flag enables resetting the command buffer for re-recording (unlike a
+	// single-time submit scenario).
+	command_pool_ci.setFlags(vk::CommandPoolCreateFlagBits::eResetCommandBuffer)
+		.setQueueFamilyIndex(m_gpu.queue_family);
+	m_render_cmd_pool = m_device->createCommandPoolUnique(command_pool_ci);
+
+	auto command_buffer_ai = vk::CommandBufferAllocateInfo{};
+	command_buffer_ai.setCommandPool(*m_render_cmd_pool)
+		.setCommandBufferCount(static_cast<std::uint32_t>(resource_buffering_v))
+		.setLevel(vk::CommandBufferLevel::ePrimary);
+	auto const command_buffers =
+		m_device->allocateCommandBuffers(command_buffer_ai);
+	assert(command_buffers.size() == m_render_sync.size());
+
+	// we create Render Fences as pre-signaled so that on the first render for
+	// each virtual frame we don't wait on their fences (since there's nothing
+	// to wait for yet).
+	static constexpr auto fence_create_info_v =
+		vk::FenceCreateInfo{vk::FenceCreateFlagBits::eSignaled};
+	for (auto [sync, command_buffer] :
+		 std::views::zip(m_render_sync, command_buffers)) {
+		sync.command_buffer = command_buffer;
+		sync.draw = m_device->createSemaphoreUnique({});
+		sync.present = m_device->createSemaphoreUnique({});
+		sync.drawn = m_device->createFenceUnique(fence_create_info_v);
+	}
+}
+```
diff --git a/guide/src/rendering/swapchain_loop.md b/guide/src/rendering/swapchain_loop.md
@@ -0,0 +1,27 @@
+# Swapchain Loop
+
+One part of rendering in the main loop is the Swapchain loop, which at a high level comprises of these steps:
+
+1. Acquire a Swapchain Image (and its view)
+1. Render to the acquired Image
+1. Present the Image (this releases the image back to the Swapchain)
+
+![WSI Engine](./wsi_engine.png)
+
+There are a few nuances to deal with, for instance:
+
+1. Acquiring (and/or presenting) will sometimes fail (eg because the Swapchain is out of date), in which case the remaining steps need to be skipped
+1. The acquire command can return before the image is actually ready for use, rendering needs to be synchronized to only start after the image is ready
+1. The images need appropriate Layout Transitions at each stage
+
+Additionally, the number of swapchain images can vary, whereas the engine should use a fixed number of _virtual frames_: 2 for double buffering, 3 for triple (more is usually overkill). It's also possible for the main loop to acquire the same image before a previous render command has finished (or even started), if the Swapchain is using Mailbox Present Mode. While FIFO will block until the oldest submitted image is available (also known as vsync), we should still synchronize and wait until the acquired image has finished rendering.
+
+## Virtual Frames
+
+All the dynamic resources used during the rendering of a frame comprise a virtual frame. The application has a fixed number of virtual frames which it cycles through on each render pass. Each frame will be associated with a `vk::Fence` which will be waited on before rendering to it again. It will also have a pair of `vk::Semaphore`s to synchronize the acquire, render, and present calls on the GPU (we don't need to wait for them in the code). Lastly, there will be a Command Buffer per virtual frame, where all rendering commands for that frame (including layout transitions) will be recorded.
+
+## Image Layouts
+
+Vulkan Images have a property known as Image Layout. Most operations on images require them to be in certain specific layouts, requiring transitions before (and after). A layout transition conveniently also functions as a Pipeline Barrier (think memory barrier on the GPU), enabling us to synchronize operations before and after the transition.
+
+Vulkan Synchronization is arguably the most complicated aspect of the API, a good amount of research is recommended. Here is an [article explaining barriers](https://gpuopen.com/learn/vulkan-barriers-explained/).
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# Rendering

		This section implements Render Sync, the Swapchain loop, performs Swapchain image layout transitions, and introduces Dynamic Rendering.