fix number of chunks (#53413)

araujoms · web-flow · commit e3b2462ec925 · 2024-03-02T14:02:45.000-05:00
The manual claims that `a` is split into `nthreads()` chunks, but this
is not true in general. As it was you could get an error, if `length(a)
&lt; nthreads()`, or a number of chunks larger than `nthreads()`, if
`nthreads()` is smaller than `length(a)` but does not divide it. With
`cld`, on the other hand, you always get at most `nthreads()` chunks.
diff --git a/doc/src/manual/multi-threading.md b/doc/src/manual/multi-threading.md
@@ -227,11 +227,11 @@ julia> sum_multi_bad(1:1_000_000)
 Note that the result is not `500000500000` as it should be, and will most likely change each evaluation.
 
 To fix this, buffers that are specific to the task may be used to segment the sum into chunks that are race-free.
-Here `sum_single` is reused, with its own internal buffer `s`. The input vector `a` is split into `nthreads()`
+Here `sum_single` is reused, with its own internal buffer `s`. The input vector `a` is split into at most `nthreads()`
 chunks for parallel work. We then use `Threads.@spawn` to create tasks that individually sum each chunk. Finally, we sum the results from each task using `sum_single` again:
 ```julia-repl
 julia> function sum_multi_good(a)
-           chunks = Iterators.partition(a, length(a) ÷ Threads.nthreads())
+           chunks = Iterators.partition(a, cld(length(a), Threads.nthreads()))
            tasks = map(chunks) do chunk
                Threads.@spawn sum_single(chunk)
            end