Skip to content

Rendering #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Deploy
on:
push:
branches:
- main
jobs:
deploy:
runs-on: ubuntu-latest
permissions:
contents: write # To push a branch
pages: write # To push to a GitHub Pages site
id-token: write # To update the deployment status
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: init
run: |
url="https://github.com/rust-lang/mdBook/releases/download/v0.4.47/mdbook-v0.4.47-x86_64-unknown-linux-gnu.tar.gz"
mkdir mdbook
curl -sSL $url | tar -xz --directory=./mdbook
echo `pwd`/mdbook >> $GITHUB_PATH
- name: build book
run: |
cd guide
mdbook build
- name: setup pages
uses: actions/configure-pages@v4
- name: upload artifact
uses: actions/upload-pages-artifact@v3
with:
path: 'book'
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
5 changes: 5 additions & 0 deletions guide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,8 @@
- [Vulkan Device](initialization/device.md)
- [Scoped Waiter](initialization/scoped_waiter.md)
- [Swapchain](initialization/swapchain.md)
- [Rendering](rendering/README.md)
- [Swapchain Loop](rendering/swapchain_loop.md)
- [Render Sync](rendering/render_sync.md)
- [Swapchain Update](rendering/swapchain_update.md)
- [Dynamic Rendering](rendering/dynamic_rendering.md)
3 changes: 3 additions & 0 deletions guide/src/rendering/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Rendering

This section implements Render Sync, the Swapchain loop, performs Swapchain image layout transitions, and introduces Dynamic Rendering.
172 changes: 172 additions & 0 deletions guide/src/rendering/dynamic_rendering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Dynamic Rendering

Dynamic Rendering enables us to avoid using Render Passes, which are quite a bit more verbose (but also generally more performant on tiled GPUs). Here we tie together the Swapchain, Render Sync, and rendering.

In the main loop, attempt to acquire a Swapchain image / Render Target:

```cpp
auto const framebuffer_size = glfw::framebuffer_size(m_window.get());
// minimized? skip loop.
if (framebuffer_size.x <= 0 || framebuffer_size.y <= 0) { continue; }
// an eErrorOutOfDateKHR result is not guaranteed if the
// framebuffer size does not match the Swapchain image size, check it
// explicitly.
auto fb_size_changed = framebuffer_size != m_swapchain->get_size();
auto& render_sync = m_render_sync.at(m_frame_index);
auto render_target = m_swapchain->acquire_next_image(*render_sync.draw);
if (fb_size_changed || !render_target) {
m_swapchain->recreate(framebuffer_size);
continue;
}
```

Wait for the associated fence and reset ('un'signal) it:

```cpp
static constexpr auto fence_timeout_v =
static_cast<std::uint64_t>(std::chrono::nanoseconds{3s}.count());
auto result = m_device->waitForFences(*render_sync.drawn, vk::True,
fence_timeout_v);
if (result != vk::Result::eSuccess) {
throw std::runtime_error{"Failed to wait for Render Fence"};
}
// reset fence _after_ acquisition of image: if it fails, the
// fence remains signaled.
m_device->resetFences(*render_sync.drawn);
```

Since the fence has been reset, a queue submission must be made that signals it before continuing, otherwise the app will deadlock on the next wait (and eventually throw after 3s). We can now begin command buffer recording:

```cpp
auto command_buffer_bi = vk::CommandBufferBeginInfo{};
// this flag means recorded commands will not be reused.
command_buffer_bi.setFlags(
vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
render_sync.command_buffer.begin(command_buffer_bi);
```

We are not ready to actually render anything yet, but can clear the image to a particular color. First we need to transition the image for rendering, ie Attachment Optimal layout. Set up the image barrier and record it:

```cpp
auto dependency_info = vk::DependencyInfo{};
auto barrier = m_swapchain->base_barrier();
// Undefined => AttachmentOptimal
// we don't need to block any operations before the barrier, since we
// rely on the image acquired semaphore to block rendering.
// any color attachment operations must happen after the barrier.
barrier.setOldLayout(vk::ImageLayout::eUndefined)
.setNewLayout(vk::ImageLayout::eAttachmentOptimal)
.setSrcAccessMask(vk::AccessFlagBits2::eNone)
.setSrcStageMask(vk::PipelineStageFlagBits2::eTopOfPipe)
.setDstAccessMask(vk::AccessFlagBits2::eColorAttachmentWrite)
.setDstStageMask(
vk::PipelineStageFlagBits2::eColorAttachmentOutput);
dependency_info.setImageMemoryBarriers(barrier);
render_sync.command_buffer.pipelineBarrier2(dependency_info);
```

Create an Rendering Attachment Info using the acquired image as the color target. We use a red clear color, make sure the Load Op clears the image, and Store Op stores the results (currently just the cleared image):

```cpp
auto attachment_info = vk::RenderingAttachmentInfo{};
attachment_info.setImageView(render_target->image_view)
.setImageLayout(vk::ImageLayout::eAttachmentOptimal)
.setLoadOp(vk::AttachmentLoadOp::eClear)
.setStoreOp(vk::AttachmentStoreOp::eStore)
.setClearValue(vk::ClearColorValue{1.0f, 0.0f, 0.0f, 1.0f});
```

Set up a Rendering Info object with the color attachment and the entire image as the render area:

```cpp
auto rendering_info = vk::RenderingInfo{};
auto const render_area =
vk::Rect2D{vk::Offset2D{}, render_target->extent};
rendering_info.setRenderArea(render_area)
.setColorAttachments(attachment_info)
.setLayerCount(1);
```

Finally, execute a render:

```cpp
render_sync.command_buffer.beginRendering(rendering_info);
// draw stuff here.
render_sync.command_buffer.endRendering();
```

Transition the image for presentation:

```cpp
// AttachmentOptimal => PresentSrc
// the barrier must wait for color attachment operations to complete.
// we don't need any post-synchronization as the present Sempahore takes
// care of that.
barrier.setOldLayout(vk::ImageLayout::eAttachmentOptimal)
.setNewLayout(vk::ImageLayout::ePresentSrcKHR)
.setSrcAccessMask(vk::AccessFlagBits2::eColorAttachmentWrite)
.setSrcStageMask(vk::PipelineStageFlagBits2::eColorAttachmentOutput)
.setDstAccessMask(vk::AccessFlagBits2::eNone)
.setDstStageMask(vk::PipelineStageFlagBits2::eBottomOfPipe);
dependency_info.setImageMemoryBarriers(barrier);
render_sync.command_buffer.pipelineBarrier2(dependency_info);
```

End the command buffer and submit it:

```cpp
render_sync.command_buffer.end();

auto submit_info = vk::SubmitInfo2{};
auto const command_buffer_info =
vk::CommandBufferSubmitInfo{render_sync.command_buffer};
auto wait_semaphore_info = vk::SemaphoreSubmitInfo{};
wait_semaphore_info.setSemaphore(*render_sync.draw)
.setStageMask(vk::PipelineStageFlagBits2::eTopOfPipe);
auto signal_semaphore_info = vk::SemaphoreSubmitInfo{};
signal_semaphore_info.setSemaphore(*render_sync.present)
.setStageMask(vk::PipelineStageFlagBits2::eColorAttachmentOutput);
submit_info.setCommandBufferInfos(command_buffer_info)
.setWaitSemaphoreInfos(wait_semaphore_info)
.setSignalSemaphoreInfos(signal_semaphore_info);
m_queue.submit2(submit_info, *render_sync.drawn);
```

The `draw` Semaphore will be signaled by the Swapchain when the image is ready, which will trigger this command buffer's execution. It will signal the `present` Semaphore and `drawn` Fence on completion, with the latter being waited on the next time this virtual frame is processed. Finally, we increment the frame index, pass the `present` semaphore as the one for the subsequent present operation to wait on:

```cpp
m_frame_index = (m_frame_index + 1) % m_render_sync.size();

if (!m_swapchain->present(m_queue, *render_sync.present)) {
m_swapchain->recreate(framebuffer_size);
continue;
}
```

> Wayland users: congratulaions, you can finally see and interact with the window!

![Cleared Image](./dynamic_rendering_red_clear.png)

## Render Doc on Wayland

At the time of writing, RenderDoc doesn't support inspecting Wayland applications. Temporarily force X11 (XWayland) by calling `glfwInitHint()` before `glfwInit()`:

```cpp
glfwInitHint(GLFW_PLATFORM, GLFW_PLATFORM_X11);
```

Setting up a command line option to conditionally call this is a simple and flexible approach: just set that argument in RenderDoc itself and/or pass it whenever an X11 backend is desired:

```cpp
// main.cpp
// skip the first argument.
auto args = std::span{argv, static_cast<std::size_t>(argc)}.subspan(1);
while (!args.empty()) {
auto const arg = std::string_view{args.front()};
if (arg == "-x" || arg == "--force-x11") {
glfwInitHint(GLFW_PLATFORM, GLFW_PLATFORM_X11);
}
args = args.subspan(1);
}
lvk::App{}.run();
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
75 changes: 75 additions & 0 deletions guide/src/rendering/render_sync.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Render Sync

Create a new header `resource_buffering.hpp`:

```cpp
// Number of virtual frames.
inline constexpr std::size_t buffering_v{2};

// Alias for N-buffered resources.
template <typename Type>
using Buffered = std::array<Type, buffering_v>;
```

Add a private `struct RenderSync` to `App`:

```cpp
struct RenderSync {
// signaled when Swapchain image has been acquired.
vk::UniqueSemaphore draw{};
// signaled when image is ready to be presented.
vk::UniqueSemaphore present{};
// signaled with present Semaphore, waited on before next render.
vk::UniqueFence drawn{};
// used to record rendering commands.
vk::CommandBuffer command_buffer{};
};
```

Add the new members associated with the Swapchain loop:

```cpp
// command pool for all render Command Buffers.
vk::UniqueCommandPool m_render_cmd_pool{};
// Sync and Command Buffer for virtual frames.
Buffered<RenderSync> m_render_sync{};
// Current virtual frame index.
std::size_t m_frame_index{};
```

Add, implement, and call the create function:

```cpp
void App::create_render_sync() {
// Command Buffers are 'allocated' from a Command Pool (which is 'created'
// like all other Vulkan objects so far). We can allocate all the buffers
// from a single pool here.
auto command_pool_ci = vk::CommandPoolCreateInfo{};
// this flag enables resetting the command buffer for re-recording (unlike a
// single-time submit scenario).
command_pool_ci.setFlags(vk::CommandPoolCreateFlagBits::eResetCommandBuffer)
.setQueueFamilyIndex(m_gpu.queue_family);
m_render_cmd_pool = m_device->createCommandPoolUnique(command_pool_ci);

auto command_buffer_ai = vk::CommandBufferAllocateInfo{};
command_buffer_ai.setCommandPool(*m_render_cmd_pool)
.setCommandBufferCount(static_cast<std::uint32_t>(resource_buffering_v))
.setLevel(vk::CommandBufferLevel::ePrimary);
auto const command_buffers =
m_device->allocateCommandBuffers(command_buffer_ai);
assert(command_buffers.size() == m_render_sync.size());

// we create Render Fences as pre-signaled so that on the first render for
// each virtual frame we don't wait on their fences (since there's nothing
// to wait for yet).
static constexpr auto fence_create_info_v =
vk::FenceCreateInfo{vk::FenceCreateFlagBits::eSignaled};
for (auto [sync, command_buffer] :
std::views::zip(m_render_sync, command_buffers)) {
sync.command_buffer = command_buffer;
sync.draw = m_device->createSemaphoreUnique({});
sync.present = m_device->createSemaphoreUnique({});
sync.drawn = m_device->createFenceUnique(fence_create_info_v);
}
}
```
27 changes: 27 additions & 0 deletions guide/src/rendering/swapchain_loop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Swapchain Loop

One part of rendering in the main loop is the Swapchain loop, which at a high level comprises of these steps:

1. Acquire a Swapchain Image (and its view)
1. Render to the acquired Image
1. Present the Image (this releases the image back to the Swapchain)

![WSI Engine](./wsi_engine.png)

There are a few nuances to deal with, for instance:

1. Acquiring (and/or presenting) will sometimes fail (eg because the Swapchain is out of date), in which case the remaining steps need to be skipped
1. The acquire command can return before the image is actually ready for use, rendering needs to be synchronized to only start after the image is ready
1. The images need appropriate Layout Transitions at each stage

Additionally, the number of swapchain images can vary, whereas the engine should use a fixed number of _virtual frames_: 2 for double buffering, 3 for triple (more is usually overkill). It's also possible for the main loop to acquire the same image before a previous render command has finished (or even started), if the Swapchain is using Mailbox Present Mode. While FIFO will block until the oldest submitted image is available (also known as vsync), we should still synchronize and wait until the acquired image has finished rendering.

## Virtual Frames

All the dynamic resources used during the rendering of a frame comprise a virtual frame. The application has a fixed number of virtual frames which it cycles through on each render pass. Each frame will be associated with a `vk::Fence` which will be waited on before rendering to it again. It will also have a pair of `vk::Semaphore`s to synchronize the acquire, render, and present calls on the GPU (we don't need to wait for them in the code). Lastly, there will be a Command Buffer per virtual frame, where all rendering commands for that frame (including layout transitions) will be recorded.

## Image Layouts

Vulkan Images have a property known as Image Layout. Most operations on images require them to be in certain specific layouts, requiring transitions before (and after). A layout transition conveniently also functions as a Pipeline Barrier (think memory barrier on the GPU), enabling us to synchronize operations before and after the transition.

Vulkan Synchronization is arguably the most complicated aspect of the API, a good amount of research is recommended. Here is an [article explaining barriers](https://gpuopen.com/learn/vulkan-barriers-explained/).
Loading