chore: update readme

madhav-madhusoodanan · madhav-madhusoodanan · commit 59d176ce8dbb · 2025-12-11T02:49:52.000+05:30
diff --git a/samples/introduction/README.md b/samples/introduction/README.md
@@ -6,3 +6,11 @@ This example demonstrates two key capabilities of CUDA events: measuring GPU exe
 1. Events are recorded at specific points within a CUDA stream to mark the beginning and end of GPU operations.
 2. Because CUDA stream operations execute asynchronously, the CPU remains free to perform other work while the GPU processes tasks (including memory transfers between host and device)
 3. The CPU can query these events to check whether the GPU has finished its work, allowing for coordination between the two processors without blocking the CPU.
+
+## [matrixMul](https://github.com/Rust-GPU/rust-cuda/samples/introduction/matmul)
+This example demonstrates an example kernel implementation of matrix multiplicaation.
+
+1. The matrices are first created on the host side and then copied to the device.
+2. A shared piece of block-specific memory is created (on the device side), so that summation can be done very quickly
+3. The result is copied back to host, where the accumulated error occur.
+4. Extra: The error that accumulates during the summation process is reduced (in the kernel itself) using [Kahan summation algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm).