Skip to content

Commit 63250fd

Browse files
authored
Wording and content tweaks (#42)
1 parent 743138c commit 63250fd

File tree

1 file changed

+39
-27
lines changed
  • blog/2024-11-21-optimizing-matrix-mul

1 file changed

+39
-27
lines changed

blog/2024-11-21-optimizing-matrix-mul/index.md

+39-27
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ To handle communication between our code on the CPU and GPU, we'll use
8383
implements the WebGPU API. On the web, it works directly with the browser's WebGPU
8484
implementation. On native platforms, it translates API calls to the platform's GPU API
8585
(Vulkan, DirectX, or Metal). This lets us run the same code on a wide range of
86-
platforms, including Windows, Linux, macOS, iOS[^1], Android, and the web[^2].
86+
platforms, including Windows, Linux, macOS[^1], iOS[^2], Android, and the web[^3].
8787

8888
By using Rust GPU and `wgpu`, we have a clean, portable setup with everything written in
8989
Rust.
@@ -147,9 +147,9 @@ There are a couple of things to note about the Rust implementation:
147147
4. The inner loop (`for i in 0..dimensions.k`) uses Rust's `for` syntax with a range.
148148
This is a higher-level abstraction compared to manually iterating with an index in
149149
other shader languages like WGSL, GLSL, or HLSL.
150-
5. Read-only inputs are immutable references (`&Dimensions` / `&[f32]`) and writeable outputs are
151-
mutable references (`&mut [f32]`). This feels very familiar to anyone used to writing
152-
Rust.
150+
5. Read-only inputs are immutable references (`&Dimensions` / `&[f32]`) and writable
151+
outputs are mutable references (`&mut [f32]`). This feels very familiar to anyone
152+
used to writing Rust.
153153

154154
#### What's with all the `usize`?
155155

@@ -181,7 +181,7 @@ Each workgroup, since it's only one thread (`#[spirv(compute(threads(1)))]`), pr
181181
one `result[i, j]`.
182182

183183
To calculate the full matrix, we need to launch as many entries as there are in the
184-
matrix. Here we specify that (`Uvec3::new(m * n, 1, 1`) on the CPU:
184+
`m * n` matrix. Here we specify that (`Uvec3::new(m * n, 1, 1`) on the CPU:
185185

186186
import { RustNaiveWorkgroupCount } from './snippets/naive.tsx';
187187

@@ -308,6 +308,14 @@ complete runnable code can be [found on
308308
GitHub](https://github.com/Rust-GPU/rust-gpu.github.io/tree/main/blog/2024-11-21-optimizing-matrix-mul/code)
309309
and you can run the benchmarks yourself with `cargo bench`.
310310

311+
:::tip
312+
313+
You can also check out real-world projects using Rust GPU such as
314+
[`autograph`](https://github.com/charles-r-earp/autograph) and
315+
[`rederling`](https://renderling.xyz/).
316+
317+
:::
318+
311319
## Reflections on porting to Rust GPU
312320

313321
Porting to Rust GPU went quickly, as the kernels Zach used were fairly simple. Most of
@@ -320,9 +328,11 @@ is not _great_ as it is still blog post code!
320328

321329
My background is not in GPU programming, but I do have Rust experience. I joined the
322330
Rust GPU project because I tried to use standard GPU languages and knew there must be a
323-
better way. Writing these GPU kernels felt like writing any other Rust code (other than
324-
debugging, more on that later) which is a huge win to me. Not just the language itself,
325-
but the entire development experience.
331+
better way.
332+
333+
Writing these GPU kernels felt like writing any other Rust code (other than debugging,
334+
more on that later) which is a huge win to me. Not just the language itself, but the
335+
entire development experience.
326336

327337
## Rust-specific party tricks
328338

@@ -372,10 +382,10 @@ bug I couldn't figure out. GPU debugging tools are limited and `printf`-style de
372382
often isn't available. But what if we could run the GPU kernel _on the CPU_, where we
373383
have access to tools like standard debuggers and good ol' `printf`/`println`?
374384

375-
With Rust GPU, this was straightforward. By using `cfg()` directives I made the
376-
GPU-specific annotations (`#[spirv(...)]`) disappear when compiling for the CPU. The
377-
result? The kernel became a regular Rust function. On the GPU, it behaves like a shader.
378-
On the CPU, it's just a function you can call directly.
385+
With Rust GPU, this was straightforward. By using standard Rust `cfg()` directives I
386+
made the GPU-specific annotations (`#[spirv(...)]`) disappear when compiling for the
387+
CPU. The result? The kernel became a regular Rust function. On the GPU, it behaves like
388+
a shader. On the CPU, it's just a function you can call directly.
379389

380390
Here's what it looks like in practice using the 2D tiling kernel from before:
381391

@@ -404,7 +414,7 @@ Testing the kernel in isolation is useful, but it does not reflect how the GPU e
404414
it with multiple invocations across workgroups and dispatches. To test the kernel
405415
end-to-end, I needed a test harness that simulated this behavior on the CPU.
406416

407-
Building the harness was straightforward due to the borrow checker. By enforcing the
417+
Building the harness was straightforward due to due to Rust. By enforcing the
408418
same invariants as the GPU I could validate the kernel under the same conditions the GPU
409419
would run it:
410420

@@ -450,7 +460,7 @@ other Rust project.
450460

451461
This required no new tools or workflows. The tools I already knew worked seamlessly.
452462
More importantly, this approach benefits anyone working on the project. Any Rust
453-
engineer can run these benchmarks with no additional setup--`cargo bench` is a standard
463+
engineer can run these benchmarks with no additional setupcargo bench` is a standard
454464
part of the Rust ecosystem.
455465

456466
### Lint
@@ -517,9 +527,9 @@ and `f64` without duplicating code, all while maintaining type safety and perfor
517527
### Error handling with `Result`
518528

519529
Rust GPU also supports error handling using `Result`. Encoding errors in the type system
520-
makes it clear where things can go wrong and forces developers to handle those cases.
521-
This is particularly useful for validating kernel inputs or handling the many edge cases
522-
in GPU logic.
530+
makes it clear where things can go wrong and forces you to handle those cases. This is
531+
particularly useful for validating kernel inputs or handling the many edge cases in GPU
532+
logic.
523533

524534
### Iterators
525535

@@ -535,12 +545,13 @@ future.
535545

536546
### Conditional compilation
537547

538-
This kernel doesn't use conditional compilation, but it's a key feature of Rust that
539-
works with Rust GPU. With `#[cfg(...)]`, you can adapt kernels to different hardware or
540-
configurations without duplicating code. GPU languages like WGSL or GLSL offer
541-
preprocessor directives, but these tools lack standardization across projects. Rust GPU
542-
leverages the existing Cargo ecosystem, so conditional compilation follows the same
543-
standards all Rust developers already know.
548+
While I briefly touched on it a couple of times, this kernel doesn't really show the
549+
full power of conditional compilation. With `#[cfg(...)]` and [cargo
550+
"features"](https://doc.rust-lang.org/cargo/reference/features.html), you can adapt
551+
kernels to different hardware or configurations without duplicating code. GPU languages
552+
like WGSL or GLSL offer preprocessor directives, but these tools lack standardization
553+
across projects. Rust GPU leverages the existing Cargo ecosystem, so conditional
554+
compilation follows the same standards all Rust developers already know.
544555

545556
## Come join us!
546557

@@ -551,7 +562,8 @@ or get involved, check out the [`rust-gpu` repo on
551562
GitHub](https://github.com/rust-gpu/rust-gpu).
552563
<br/>
553564

554-
[^1]: Via [MoltenVK](https://github.com/KhronosGroup/MoltenVK)
555-
[^2]:
556-
Technically `wgpu` translates SPIR-V to GLSL or WGSL via
557-
[naga](https://github.com/gfx-rs/wgpu/tree/trunk/naga)
565+
[^1]: Technically `wgpu` uses [MoltenVK](https://github.com/KhronosGroup/MoltenVK) or translates to Metal on macOS
566+
[^2]: Technically `wgpu` uses [MoltenVK](https://github.com/KhronosGroup/MoltenVK) or translates to Metal on iOS
567+
[^3]:
568+
Technically `wgpu` translates SPIR-V to GLSL (WebGL) or WGSL (WebGPU) via
569+
[naga](https://github.com/gfx-rs/wgpu/tree/trunk/naga) on the web

0 commit comments

Comments
 (0)