Skip to content

Commit 59d176c

Browse files
chore: update readme
1 parent be37567 commit 59d176c

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

samples/introduction/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,11 @@ This example demonstrates two key capabilities of CUDA events: measuring GPU exe
66
1. Events are recorded at specific points within a CUDA stream to mark the beginning and end of GPU operations.
77
2. Because CUDA stream operations execute asynchronously, the CPU remains free to perform other work while the GPU processes tasks (including memory transfers between host and device)
88
3. The CPU can query these events to check whether the GPU has finished its work, allowing for coordination between the two processors without blocking the CPU.
9+
10+
## [matrixMul](https://github.com/Rust-GPU/rust-cuda/samples/introduction/matmul)
11+
This example demonstrates an example kernel implementation of matrix multiplicaation.
12+
13+
1. The matrices are first created on the host side and then copied to the device.
14+
2. A shared piece of block-specific memory is created (on the device side), so that summation can be done very quickly
15+
3. The result is copied back to host, where the accumulated error occur.
16+
4. Extra: The error that accumulates during the summation process is reduced (in the kernel itself) using [Kahan summation algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm).

0 commit comments

Comments
 (0)