You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: samples/introduction/README.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,3 +6,11 @@ This example demonstrates two key capabilities of CUDA events: measuring GPU exe
6
6
1. Events are recorded at specific points within a CUDA stream to mark the beginning and end of GPU operations.
7
7
2. Because CUDA stream operations execute asynchronously, the CPU remains free to perform other work while the GPU processes tasks (including memory transfers between host and device)
8
8
3. The CPU can query these events to check whether the GPU has finished its work, allowing for coordination between the two processors without blocking the CPU.
This example demonstrates an example kernel implementation of matrix multiplicaation.
12
+
13
+
1. The matrices are first created on the host side and then copied to the device.
14
+
2. A shared piece of block-specific memory is created (on the device side), so that summation can be done very quickly
15
+
3. The result is copied back to host, where the accumulated error occur.
16
+
4. Extra: The error that accumulates during the summation process is reduced (in the kernel itself) using [Kahan summation algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm).
0 commit comments