Skip to content

Commit c65bb9f

Browse files
Update docs
1 parent 0769279 commit c65bb9f

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

_sources/deeplearning_operators/gemv.md.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,7 @@ def splitk_gemv_vectorized(
252252
return main
253253
```
254254

255-
With vectorized read, now the kernel finishs in **~0.0084 ms**, which is getting close to cuBLAS performance.
255+
With vectorized read, now the kernel finishes in **~0.0084 ms**, which is getting close to cuBLAS performance.
256256

257257

258258
## `tvm_thread_allreduce` Instead of `atomicAdd`

deeplearning_operators/gemv.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -699,7 +699,7 @@ <h2>Vectorized Reads<a class="headerlink" href="#vectorized-reads" title="Link t
699699
<span class="k">return</span> <span class="n">main</span>
700700
</pre></div>
701701
</div>
702-
<p>With vectorized read, now the kernel finishs in <strong>~0.0084 ms</strong>, which is getting close to cuBLAS performance.</p>
702+
<p>With vectorized read, now the kernel finishes in <strong>~0.0084 ms</strong>, which is getting close to cuBLAS performance.</p>
703703
</section>
704704
<section id="tvm-thread-allreduce-instead-of-atomicadd">
705705
<h2><code class="docutils literal notranslate"><span class="pre">tvm_thread_allreduce</span></code> Instead of <code class="docutils literal notranslate"><span class="pre">atomicAdd</span></code><a class="headerlink" href="#tvm-thread-allreduce-instead-of-atomicadd" title="Link to this heading"></a></h2>

0 commit comments

Comments
 (0)