-
There is a There is also a So for a backend that does not have the concept of command buffer encoder, we can just ignore the second type of Fence and there is no need to register inputs/outputs, like the |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
That's exactly right. We may want to use multiple CUDA streams (I can't recall how much parallelism it will do if you don't).. but I would probably do the simple thing first (e.g. just launch kernels the default way). Maybe checkout the docs on CUDA streams and just to make sure we design in a way that's extensible to that if needed. |
Beta Was this translation helpful? Give feedback.
That's exactly right. We may want to use multiple CUDA streams (I can't recall how much parallelism it will do if you don't).. but I would probably do the simple thing first (e.g. just launch kernels the default way). Maybe checkout the docs on CUDA streams and just to make sure we design in a way that's extensible to that if needed.