Skip to content

Commit 811b904

Browse files
authored
VMA, Buffers, Images (#15)
* VMA * WIP: Buffers * Vertex and Index Buffers * Fix warnings * Refactor `vma::Buffer` API * Add `CommandBlock`, use device VBO * Refactor Buffer API * WIP images * Bugfixes etc * Sampled image * Cleanup
1 parent a8d6b8d commit 811b904

23 files changed

+1257
-20
lines changed

assets/shader.vert

-416 Bytes
Binary file not shown.

ext/CMakeLists.txt

+27
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,10 @@ target_compile_definitions(glm PUBLIC
2727
message(STATUS "[Vulkan-Headers]")
2828
add_subdirectory(src/Vulkan-Headers)
2929

30+
# add VulkanMemoryAllocator to build tree
31+
message(STATUS "[VulkanMemoryAllocator]")
32+
add_subdirectory(src/VulkanMemoryAllocator)
33+
3034
# setup Dear ImGui library
3135
message(STATUS "[Dear ImGui]")
3236
add_library(imgui)
@@ -55,6 +59,28 @@ target_sources(imgui PRIVATE
5559
src/imgui/backends/imgui_impl_vulkan.h
5660
)
5761

62+
# setup vma library (source file with VMA interface)
63+
message(STATUS "[vma]")
64+
add_library(vma)
65+
add_library(vma::vma ALIAS vma)
66+
target_link_libraries(vma PUBLIC
67+
Vulkan::Headers
68+
GPUOpen::VulkanMemoryAllocator
69+
)
70+
target_include_directories(vma SYSTEM PUBLIC
71+
src/VulkanMemoryAllocator/include
72+
)
73+
target_compile_definitions(vma PUBLIC
74+
VMA_STATIC_VULKAN_FUNCTIONS=0
75+
VMA_DYNAMIC_VULKAN_FUNCTIONS=1
76+
)
77+
target_sources(vma PRIVATE
78+
vk_mem_alloc.cpp
79+
)
80+
81+
# ignore compiler warnings
82+
target_compile_options(vma PRIVATE -w)
83+
5884
# declare ext library target
5985
add_library(${PROJECT_NAME} INTERFACE)
6086
add_library(learn-vk::ext ALIAS ${PROJECT_NAME})
@@ -63,6 +89,7 @@ add_library(learn-vk::ext ALIAS ${PROJECT_NAME})
6389
target_link_libraries(${PROJECT_NAME} INTERFACE
6490
glm::glm
6591
imgui::imgui
92+
vma::vma
6693
)
6794

6895
# setup preprocessor defines

ext/src.zip

186 KB
Binary file not shown.

ext/vk_mem_alloc.cpp

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#define VMA_IMPLEMENTATION
2+
3+
#include <vk_mem_alloc.h>

guide/src/SUMMARY.md

+10
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,13 @@
3333
- [GLSL to SPIR-V](shader_objects/glsl_to_spir_v.md)
3434
- [Drawing a Triangle](shader_objects/drawing_triangle.md)
3535
- [Graphics Pipelines](shader_objects/pipelines.md)
36+
37+
# Shader Resources
38+
39+
- [Memory Allocation](memory/README.md)
40+
- [Vulkan Memory Allocator](memory/vma.md)
41+
- [Buffers](memory/buffers.md)
42+
- [Vertex Buffer](memory/vertex_buffer.md)
43+
- [Command Block](memory/command_block.md)
44+
- [Device Buffers](memory/device_buffers.md)
45+
- [Images](memory/images.md)

guide/src/memory/README.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Memory Allocation
2+
3+
Being an explicit API, [allocating memory](https://docs.vulkan.org/guide/latest/memory_allocation.html) in Vulkan that can be used by the device is the application's responsibility. The specifics can get quite complicated, but as recommended by the spec, we shall simply defer all that to a library: [Vulkan Memory Allocator (VMA)](https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator).
4+
5+
Vulkan exposes two kinds of objects that use such allocated memory: Buffers and Images, VMA offers transparent support for both: we just have to allocate/free buffers and images through VMA instead of the device directly. Unlike memory allocation / object construction on the CPU, there are many more parameters (than say alignment and size) to provide for the creation of buffers and images. As you might have guessed, we shall constrain ourselves to a subset that's relevant for shader resources: vertex buffers, uniform/storage buffers, and texture images.

guide/src/memory/buffers.md

+94
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Buffers
2+
3+
First add the RAII wrapper components for VMA buffers:
4+
5+
```cpp
6+
struct RawBuffer {
7+
[[nodiscard]] auto mapped_span() const -> std::span<std::byte> {
8+
return std::span{static_cast<std::byte*>(mapped), size};
9+
}
10+
11+
auto operator==(RawBuffer const& rhs) const -> bool = default;
12+
13+
VmaAllocator allocator{};
14+
VmaAllocation allocation{};
15+
vk::Buffer buffer{};
16+
vk::DeviceSize size{};
17+
void* mapped{};
18+
};
19+
20+
struct BufferDeleter {
21+
void operator()(RawBuffer const& raw_buffer) const noexcept;
22+
};
23+
24+
// ...
25+
void BufferDeleter::operator()(RawBuffer const& raw_buffer) const noexcept {
26+
vmaDestroyBuffer(raw_buffer.allocator, raw_buffer.buffer,
27+
raw_buffer.allocation);
28+
}
29+
```
30+
31+
Buffers can be backed by host (RAM) or device (VRAM) memory: the former is mappable and thus useful for data that changes every frame, latter is faster to access for the GPU but needs more complex methods to copy data to. Add the related types and a create function:
32+
33+
```cpp
34+
struct BufferCreateInfo {
35+
VmaAllocator allocator;
36+
vk::BufferUsageFlags usage;
37+
std::uint32_t queue_family;
38+
};
39+
40+
enum class BufferMemoryType : std::int8_t { Host, Device };
41+
42+
[[nodiscard]] auto create_buffer(BufferCreateInfo const& create_info,
43+
BufferMemoryType memory_type,
44+
vk::DeviceSize size) -> Buffer;
45+
46+
// ...
47+
auto vma::create_buffer(BufferCreateInfo const& create_info,
48+
BufferMemoryType const memory_type,
49+
vk::DeviceSize const size) -> Buffer {
50+
if (size == 0) {
51+
std::println(stderr, "Buffer cannot be 0-sized");
52+
return {};
53+
}
54+
55+
auto allocation_ci = VmaAllocationCreateInfo{};
56+
allocation_ci.flags =
57+
VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT;
58+
auto usage = create_info.usage;
59+
if (memory_type == BufferMemoryType::Device) {
60+
allocation_ci.usage = VMA_MEMORY_USAGE_AUTO_PREFER_DEVICE;
61+
// device buffers need to support TransferDst.
62+
usage |= vk::BufferUsageFlagBits::eTransferDst;
63+
} else {
64+
allocation_ci.usage = VMA_MEMORY_USAGE_AUTO_PREFER_HOST;
65+
// host buffers can provide mapped memory.
66+
allocation_ci.flags |= VMA_ALLOCATION_CREATE_MAPPED_BIT;
67+
}
68+
69+
auto buffer_ci = vk::BufferCreateInfo{};
70+
buffer_ci.setQueueFamilyIndices(create_info.queue_family)
71+
.setSize(size)
72+
.setUsage(usage);
73+
auto vma_buffer_ci = static_cast<VkBufferCreateInfo>(buffer_ci);
74+
75+
VmaAllocation allocation{};
76+
VkBuffer buffer{};
77+
auto allocation_info = VmaAllocationInfo{};
78+
auto const result =
79+
vmaCreateBuffer(create_info.allocator, &vma_buffer_ci, &allocation_ci,
80+
&buffer, &allocation, &allocation_info);
81+
if (result != VK_SUCCESS) {
82+
std::println(stderr, "Failed to create VMA Buffer");
83+
return {};
84+
}
85+
86+
return RawBuffer{
87+
.allocator = create_info.allocator,
88+
.allocation = allocation,
89+
.buffer = buffer,
90+
.size = size,
91+
.mapped = allocation_info.pMappedData,
92+
};
93+
}
94+
```

guide/src/memory/command_block.md

+84
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Command Block
2+
3+
Long-lived vertex buffers perform better when backed by Device memory, especially for 3D meshes. Data is transferred to device buffers in two steps:
4+
5+
1. Allocate a host buffer and copy the data to its mapped memory
6+
1. Allocate a device buffer, record a Buffer Copy operation and submit it
7+
8+
The second step requires a command buffer and queue submission (_and_ waiting for the submitted work to complete). Encapsulate this behavior into a class, it will also be used for creating images:
9+
10+
```cpp
11+
class CommandBlock {
12+
public:
13+
explicit CommandBlock(vk::Device device, vk::Queue queue,
14+
vk::CommandPool command_pool);
15+
16+
[[nodiscard]] auto command_buffer() const -> vk::CommandBuffer {
17+
return *m_command_buffer;
18+
}
19+
20+
void submit_and_wait();
21+
22+
private:
23+
vk::Device m_device{};
24+
vk::Queue m_queue{};
25+
vk::UniqueCommandBuffer m_command_buffer{};
26+
};
27+
```
28+
29+
The constructor takes an existing command pool created for such ad-hoc allocations, and the queue for submission later. This way it can be passed around after creation and used by other code.
30+
31+
```cpp
32+
CommandBlock::CommandBlock(vk::Device const device, vk::Queue const queue,
33+
vk::CommandPool const command_pool)
34+
: m_device(device), m_queue(queue) {
35+
// allocate a UniqueCommandBuffer which will free the underlying command
36+
// buffer from its owning pool on destruction.
37+
auto allocate_info = vk::CommandBufferAllocateInfo{};
38+
allocate_info.setCommandPool(command_pool)
39+
.setCommandBufferCount(1)
40+
.setLevel(vk::CommandBufferLevel::ePrimary);
41+
// all the current VulkanHPP functions for UniqueCommandBuffer allocation
42+
// return vectors.
43+
auto command_buffers = m_device.allocateCommandBuffersUnique(allocate_info);
44+
m_command_buffer = std::move(command_buffers.front());
45+
46+
// start recording commands before returning.
47+
auto begin_info = vk::CommandBufferBeginInfo{};
48+
begin_info.setFlags(vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
49+
m_command_buffer->begin(begin_info);
50+
}
51+
```
52+
53+
`submit_and_wait()` resets the unique command buffer at the end, to free it from its command pool:
54+
55+
```cpp
56+
void CommandBlock::submit_and_wait() {
57+
if (!m_command_buffer) { return; }
58+
59+
// end recording and submit.
60+
m_command_buffer->end();
61+
auto submit_info = vk::SubmitInfo2KHR{};
62+
auto const command_buffer_info =
63+
vk::CommandBufferSubmitInfo{*m_command_buffer};
64+
submit_info.setCommandBufferInfos(command_buffer_info);
65+
auto fence = m_device.createFenceUnique({});
66+
m_queue.submit2(submit_info, *fence);
67+
68+
// wait for submit fence to be signaled.
69+
static constexpr auto timeout_v =
70+
static_cast<std::uint64_t>(std::chrono::nanoseconds(30s).count());
71+
auto const result = m_device.waitForFences(*fence, vk::True, timeout_v);
72+
if (result != vk::Result::eSuccess) {
73+
std::println(stderr, "Failed to submit Command Buffer");
74+
}
75+
// free the command buffer.
76+
m_command_buffer.reset();
77+
}
78+
```
79+
80+
## Multithreading considerations
81+
82+
Instead of blocking the main thread on every Command Block's `submit_and_wait()`, you might be wondering if command block usage could be multithreaded. The answer is yes! But with some extra work: each thread will require its own command pool - just using one owned (unique) pool per Command Block (with no need to free the buffer) is a good starting point. All queue operations need to be synchronized, ie a critical section protected by a mutex. This includes Swapchain acquire/present calls, and Queue submissions. A `class Queue` value type that stores a copy of the `vk::Queue` and a pointer/reference to its `std::mutex` - and wraps the submit call - can be passed to command blocks. Just this much will enable asynchronous asset loading etc, as each loading thread will use its own command pool, and queue submissions all around will be critical sections. `VmaAllocator` is internally synchronized (can be disabled at build time), so performing allocations through the same allocator on multiple threads is safe.
83+
84+
For multi-threaded rendering, use a Secondary command buffer per thread to record rendering commands, accumulate and execute them in the main (Primary) command buffer currently in `RenderSync`. This is not particularly helpful unless you have thousands of expensive draw calls and dozens of render passes, as recording even a hundred draws will likely be faster on a single thread.

guide/src/memory/device_buffers.md

+133
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# Device Buffers
2+
3+
This guide will only use device buffers for vertex buffers, where both vertex and index data will be strung together in a single VBO. The create function can thus take the data and perform the buffer copy operation before returning. In essence this return value is a "GPU const" buffer. To enable utilizing separate spans for vertices and indices (instead of forcing allocation of a contiguous bytestream and copying the data), the create function takes a slightly awkward span of spans:
4+
5+
```cpp
6+
// disparate byte spans.
7+
using ByteSpans = std::span<std::span<std::byte const> const>;
8+
9+
// returns a Device Buffer with each byte span sequentially written.
10+
[[nodiscard]] auto create_device_buffer(BufferCreateInfo const& create_info,
11+
CommandBlock command_block,
12+
ByteSpans const& byte_spans) -> Buffer;
13+
```
14+
15+
Implement `create_device_buffer()`:
16+
17+
```cpp
18+
auto vma::create_device_buffer(BufferCreateInfo const& create_info,
19+
CommandBlock command_block,
20+
ByteSpans const& byte_spans) -> Buffer {
21+
auto const total_size = std::accumulate(
22+
byte_spans.begin(), byte_spans.end(), 0uz,
23+
[](std::size_t const n, std::span<std::byte const> bytes) {
24+
return n + bytes.size();
25+
});
26+
27+
auto staging_ci = create_info;
28+
staging_ci.usage = vk::BufferUsageFlagBits::eTransferSrc;
29+
30+
// create staging Host Buffer with TransferSrc usage.
31+
auto staging_buffer =
32+
create_buffer(staging_ci, BufferMemoryType::Host, total_size);
33+
// create the Device Buffer.
34+
auto ret = create_buffer(create_info, BufferMemoryType::Device, total_size);
35+
// can't do anything if either buffer creation failed.
36+
if (!staging_buffer.get().buffer || !ret.get().buffer) { return {}; }
37+
38+
// copy byte spans into staging buffer.
39+
auto dst = staging_buffer.get().mapped_span();
40+
for (auto const bytes : byte_spans) {
41+
std::memcpy(dst.data(), bytes.data(), bytes.size());
42+
dst = dst.subspan(bytes.size());
43+
}
44+
45+
// record buffer copy operation.
46+
auto buffer_copy = vk::BufferCopy2{};
47+
buffer_copy.setSize(total_size);
48+
auto copy_buffer_info = vk::CopyBufferInfo2{};
49+
copy_buffer_info.setSrcBuffer(staging_buffer.get().buffer)
50+
.setDstBuffer(ret.get().buffer)
51+
.setRegions(buffer_copy);
52+
command_block.command_buffer().copyBuffer2(copy_buffer_info);
53+
54+
// submit and wait.
55+
// waiting here is necessary to keep the staging buffer alive while the GPU
56+
// accesses it through the recorded commands.
57+
// this is also why the function takes ownership of the passed CommandBlock
58+
// instead of just referencing it / taking a vk::CommandBuffer.
59+
command_block.submit_and_wait();
60+
61+
return ret;
62+
}
63+
```
64+
65+
Add a command block pool to `App`, and a helper function to create command blocks:
66+
67+
```cpp
68+
void App::create_cmd_block_pool() {
69+
auto command_pool_ci = vk::CommandPoolCreateInfo{};
70+
command_pool_ci
71+
.setQueueFamilyIndex(m_gpu.queue_family)
72+
// this flag indicates that the allocated Command Buffers will be
73+
// short-lived.
74+
.setFlags(vk::CommandPoolCreateFlagBits::eTransient);
75+
m_cmd_block_pool = m_device->createCommandPoolUnique(command_pool_ci);
76+
}
77+
78+
auto App::create_command_block() const -> CommandBlock {
79+
return CommandBlock{*m_device, m_queue, *m_cmd_block_pool};
80+
}
81+
```
82+
83+
Update `create_vertex_buffer()` to create a quad with indices:
84+
85+
```cpp
86+
template <typename T>
87+
[[nodiscard]] constexpr auto to_byte_array(T const& t) {
88+
return std::bit_cast<std::array<std::byte, sizeof(T)>>(t);
89+
}
90+
91+
// ...
92+
void App::create_vertex_buffer() {
93+
// vertices of a quad.
94+
static constexpr auto vertices_v = std::array{
95+
Vertex{.position = {-0.5f, -0.5f}, .color = {1.0f, 0.0f, 0.0f}},
96+
Vertex{.position = {0.5f, -0.5f}, .color = {0.0f, 1.0f, 0.0f}},
97+
Vertex{.position = {0.5f, 0.5f}, .color = {0.0f, 0.0f, 1.0f}},
98+
Vertex{.position = {-0.5f, 0.5f}, .color = {1.0f, 1.0f, 0.0f}},
99+
};
100+
static constexpr auto indices_v = std::array{
101+
0u, 1u, 2u, 2u, 3u, 0u,
102+
};
103+
static constexpr auto vertices_bytes_v = to_byte_array(vertices_v);
104+
static constexpr auto indices_bytes_v = to_byte_array(indices_v);
105+
static constexpr auto total_bytes_v =
106+
std::array<std::span<std::byte const>, 2>{
107+
vertices_bytes_v,
108+
indices_bytes_v,
109+
};
110+
// we want to write total_bytes_v to a Device VertexBuffer | IndexBuffer.
111+
m_vbo = vma::create_device_buffer(m_allocator.get(),
112+
vk::BufferUsageFlagBits::eVertexBuffer |
113+
vk::BufferUsageFlagBits::eIndexBuffer,
114+
create_command_block(), total_bytes_v);
115+
}
116+
```
117+
118+
Update `draw()`:
119+
120+
```cpp
121+
void App::draw(vk::CommandBuffer const command_buffer) const {
122+
m_shader->bind(command_buffer, m_framebuffer_size);
123+
// single VBO at binding 0 at no offset.
124+
command_buffer.bindVertexBuffers(0, m_vbo.get().buffer, vk::DeviceSize{});
125+
// u32 indices after offset of 4 vertices.
126+
command_buffer.bindIndexBuffer(m_vbo.get().buffer, 4 * sizeof(Vertex),
127+
vk::IndexType::eUint32);
128+
// m_vbo has 6 indices.
129+
command_buffer.drawIndexed(6, 1, 0, 0, 0);
130+
}
131+
```
132+
133+
![VBO Quad](./vbo_quad.png)

0 commit comments

Comments
 (0)