Initial nccl implementation -- experimental #1

aniabrown-nvidia · 2022-12-21T15:15:50Z

Initial nccl implementation using only the default stream. Contains some temporary fixes to get code to compile and some temporary experimentation into mpi performance -- will require cleanup later. Suggest merging into a separate nccl branch for now.

Requires linking additional libraries: -lnvToolsExt -lnccl

Some extra context for commits:
01a6df2, e77d120 -- very temporary fixes to compile time and run time errors outside of dgemms -- needs looking into
5c4ce5b -- Bug fix -- source memory was not being allocated. At time of writing commit I had assumed memory for sources was allocated every iteration. This is not the case, but we may still want to allocate a pool of memory for atrip early on in program execution, particularly to address the following commit.
5c4ce5b -- This alloc was taking a similar amount of time as the dgemm for No=50. Need to check if this takes non-negligible time for larger sizes
46c56b9 -- removing host-device transfer that's not needed in the gpu source version
e95ca45 -- this 'warm up' was for experimentation only -- used to test point-to-point handle creation at start of app
7878a14 -- switching point to point comms to use nccl

…e-only-dgemm

…ingle user managed workspace rather than reallocating each iteration

…s are resident on gpu

aniabrown-nvidia added 8 commits December 13, 2022 02:13

temporary fix to get around build errors

01a6df2

temporary fix to get around segfaults when trying to run with --enabl…

e77d120

…e-only-dgemm

allocate memory for sources on gpu. temporary, will want to move to s…

5c4ce5b

…ingle user managed workspace rather than reallocating each iteration

add nvtx ranges

cccca5b

don't need to copy to separate mpi_data array on the host when source…

46c56b9

…s are resident on gpu

1.syntax error fix 2.allocate temporary buffers only once per sim

c507c32

add warm up

e95ca45

initial nccl attempt

7878a14

alejandrogallo pushed a commit that referenced this pull request Apr 28, 2023

Fix AniaBug #1: cublasCreate after context setting

5483325

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial nccl implementation -- experimental #1

Initial nccl implementation -- experimental #1

aniabrown-nvidia commented Dec 21, 2022

Initial nccl implementation -- experimental #1

Are you sure you want to change the base?

Initial nccl implementation -- experimental #1

Conversation

aniabrown-nvidia commented Dec 21, 2022