Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CUDA profiling scaffolding with nvtx (#491)
Summary: Introduce a new profiling framework for the CUDA backend. Uses NVTX (https://docs.nvidia.com/nsight-visual-studio-edition/nvtx/index.html) -- finds and links the lib/headers. NVTX provides integration with Nsight profiling tools, which I'll give more details about soon. Docs forthcoming in a future diff. Compile with `FL_BUILD_PROFILING=ON` to build profiling things. `OFF` by default. CUDA backend only. We can add CPU profiling in an unopinionated way whenever we're ready. [Note -- this doesn't build right now since `CudaUtils.*` isn't moved - this will happen in a follow-up diff/commit to fix the fbcode build] ### Usage: RAII shims that call into CUDA NVTX profile start/stop marking functions: - `FL_SCOPED_PROFILE()` starts and stops profiling. This will help keep the size of our profiles small and high-signal and speed up execution speed given we can avoid more profiling overhead. - `FL_PROFILE_TRACE("myMarker")` -- marks a code section with a particular marker string. RAII -- only marks the section while in scope. Best used when profiling a function. Pull Request resolved: flashlight/flashlight#491 Test Plan: Tested with Nsight profilers with a transformer benchmark. Tutorial and more documentation on how to get these running is forthcoming. Reviewed By: padentomasello Differential Revision: D26929884 Pulled By: jacobkahn fbshipit-source-id: 3d79ebcf9617587b68411725c028bdd94309996f
- Loading branch information