Skip to content

Commit 0ead512

Browse files
committed
Use the block heuristic when determining a launch configuration.
1 parent 4cdb50b commit 0ead512

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

src/device/execution.jl

+2-1
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,8 @@ end
9595
function launch_configuration(backend::AbstractGPUBackend, heuristic;
9696
elements::Int, elements_per_thread::Int)
9797
threads = clamp(elements, 1, heuristic.threads)
98-
blocks = max(cld(elements, threads), 1)
98+
blocks = max(cld(elements, threads), heuristic.blocks)
99+
threads = cld(elements, blocks)
99100

100101
if elements_per_thread > 1 && blocks > heuristic.blocks
101102
# we want to launch more blocks than required, so prefer a grid-stride loop instead

0 commit comments

Comments
 (0)