CK GEMM Backend #1480

alugorey · 2024-07-18T19:01:04Z

Porting recent ck gemm backend changes to ROCm

* changes to build Centos stream 9 images * Added scripts for centos and centos stream images * Added an extra line * Add ninja installation * Optimized code * Fixes * Add comment * Optimized code * Added AMDGPU mapping for ROCm 5.2 and invalid-url for rocm_baseurl Co-authored-by: Jithun Nair <[email protected]>

- Rocblas API support is requested - SWDEV-383635 & sub task - SWDEV-390218

* Add hip_basic tensorpipe support to PyTorch * Enabling hip_basic for Tensorpipe for pyTorch * removing upstream tensorpipe module * Adding ROCm specific tensopipe submodule * tensorpipe submodule updated * Update the hip invalid device string * Added ignore for tensorpipe git submodule * Moved include of tensorpipe_cuda.h to hipify * Updates based on review comments * Defining the variable __HIP_PLATFORM_AMD__ * Enabling the UTs Co-authored-by: Ronak Malik <[email protected]>

- Fortran package installation moved after gcc - Update libtinfo search code in cmake1 - Install libstdc++.so

To resolve https://ontrack-internal.amd.com/browse/SWDEV-403530 and https://ontrack-internal.amd.com/browse/SWDEV-419837. For more context check upstream issue pytorch#111834

Reversed the condition as required

- Add missing common_utils.sh - Update the install vision part - Move to amdgpu rhel 9.3 builds - Update to pick python from conda path - Add a missing package - Add ROCM_PATH and magma - Updated repo radeon path

This also fixes a problem in gesvd driver when UV is not needed.

- build_environment is hard coded to value from upstream when branch for created, since the dev/QA ENV build_environment value can be varing

* Fix the parsing of /etc/os-release The old code parses OS_DISTRO as 'PRETTY_Ubuntu' on Ubuntu and thus never links to libtinfo correctly. * Configurable CMAKE_PREFIX_PATH in CI script.

- This is done as per QA request, needs to be reverted and not required to be cherry-picked into later releases.

* Moved NAVI check to the test file * Revised NAVI check as a function

…m#1374)

* Running triton kernel on ROCM only has one GB/s metric reported * Update test_kernel_benchmark.py

…m#1386) * Initial implementation of PyTorch ut parsing script * Extracted path variables * Use nested dict to save results * Fixes typo * Cleanup * Fixes several issues * Minor name change * Update run_pytorch_unit_tests.py * Added file banners * Supported running from API * Added more help info * Consistent naming * Format help text --------- Co-authored-by: Jithun Nair <[email protected]> Co-authored-by: Jithun Nair <[email protected]>

- PYTORCH_EXTRA_INSTALL_REQUIREMENTS is set in builder repo - Remove the PYTORCH_EXTRA_INSTALL_REQUIREMENTS step from this file

- Causing regression - SWDEV-463083

* Fix SWDEV-459623. The Rank of logsumexp Tensor must be 3. This tensor was considered for internal use only but apparently exposed to UTs. * Fix for mGPU. The stream should be selected after picking the current device according to input tensor.

* Add formal FP8 check in common_cuda.py * Enable inductor/test_valid_cast * Support for test_eager_fallback * allow fnuz types on amax test * Finalize passing tests vs failing * Fix fnuz constants in _to_fp8_saturated

* Enable batchnorm NHWC for MIOpen * cleanup * test to compare NHWC MIOpen batchnorm with CPU * fix 'use_miopen' condition for nhwc miopen * fix includes * use native nhwc batchnorm to verify miopen * remove extra spaces * remove empty lines * set PYTORCH_MIOPEN_SUGGEST_NHWC=1 for all test_nn.py test

…OCm#1433) * Print consolidated log file for pytorch uts * Update run_entire_tests subprocess call as well * lint * Add ERROR string

* Initial commit to port intra_node_comm to ROCm (cherry picked from commit 48d1c33) * gpt-fast running now with intra-node comm (cherry picked from commit 618c54e) --------- Co-authored-by: Prachi Gupta <[email protected]>

…OCm#1434)

Co-authored-by: Jithun Nair <[email protected]>

IFU for rocm6.3_internal_testing

…#1449)

rocm-mici · 2024-10-09T07:06:39Z

Jenkins build for 1b6b84ecf382b55ed398c6d89714363da20a59f5 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-mici · 2024-10-09T07:07:10Z

Jenkins build for 1b6b84ecf382b55ed398c6d89714363da20a59f5 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2024-12-11T22:21:41Z

Jenkins build for 1b6b84ecf382b55ed398c6d89714363da20a59f5 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rraminen and others added 30 commits June 17, 2024 13:07

Updated to latest conda for CentOS stream 9

b8a2811

Temporarily skip test_conv3d_64bit_indexing

59e9341

- Rocblas API support is requested - SWDEV-383635 & sub task - SWDEV-390218

Updates to build on Jammy

0b08278

- Fortran package installation moved after gcc - Update libtinfo search code in cmake1 - Install libstdc++.so

Fix lstsq related regressions (part of SWDEV-392820)

6e7704d

[UB22.04] Updates to support latest scipy

108bf57

Build required version of libpng for CentOS7

15da21a

Update tensorpipe submodule to support ROCm 6.0

4003496

Set ROCM_PATH in env for centOS docker container

2cfad86

Updated condition for libstc++ for Jammy

3c19bf9

Skip ddp apply_optim_in_bwd tests for gloo (ROCm#1302)

b7e47fa

To resolve https://ontrack-internal.amd.com/browse/SWDEV-403530 and https://ontrack-internal.amd.com/browse/SWDEV-419837. For more context check upstream issue pytorch#111834

Changes to support docker v23

032320c

Reversed the condition as required

[CS9] Updates to CentOS stream 9 build (ROCm#1326)

50d56db

- Add missing common_utils.sh - Update the install vision part - Move to amdgpu rhel 9.3 builds - Update to pick python from conda path - Add a missing package - Add ROCM_PATH and magma - Updated repo radeon path

Update to hipify mapping

17ba54f

Correcting usage of USE_ROCM

e00045a

Enable gesvda for ROCM >= 6.1 (ROCm#1339)

7f3172f

This also fixes a problem in gesvd driver when UV is not needed.

Increase lifespan of test-times files

a2d6ace

- build_environment is hard coded to value from upstream when branch for created, since the dev/QA ENV build_environment value can be varing

Fixes CI build script (ROCm#1350)

00307cc

* Fix the parsing of /etc/os-release The old code parses OS_DISTRO as 'PRETTY_Ubuntu' on Ubuntu and thus never links to libtinfo correctly. * Configurable CMAKE_PREFIX_PATH in CI script.

[NO CP] Temporary dumping of test exec log to stderr

3120778

- This is done as per QA request, needs to be reverted and not required to be cherry-picked into later releases.

Add skipIfRocmArch decorator for Navi skips (ROCm#1356)

9726c26

Converted NAVI check as a function (ROCm#1364)

91125f1

* Moved NAVI check to the test file * Revised NAVI check as a function

Triton build conditionalized on ROCM_VERSION

623579f

Remove ROCmloops specific test

b39d5fa

Bad import in test_torchinductor and skip torchvision related UT (ROC…

6d3494e

…m#1374)

skip test_inductor_freezing failing UTs (ROCm#1375)

f02e87f

Skip test_mm_triton_kernel_benchmark (ROCm#1376)

c1f1f51

* Running triton kernel on ROCM only has one GB/s metric reported * Update test_kernel_benchmark.py

[HIP] Returned error string update

6f65d22

PR ROCm#1255 to rocm6.2 release

98df198

pruthvistony and others added 23 commits June 20, 2024 15:56

Include the ROCm version in triton version

a0872c0

Change Torch extra install requirement

700ee13

- PYTORCH_EXTRA_INSTALL_REQUIREMENTS is set in builder repo - Remove the PYTORCH_EXTRA_INSTALL_REQUIREMENTS step from this file

Remove the installation of rocm-llvm-dev package

8f95824

- Causing regression - SWDEV-463083

Fix SWDEV-459623 (ROCm#1428)

5f9b3f4

* Fix SWDEV-459623. The Rank of logsumexp Tensor must be 3. This tensor was considered for internal use only but apparently exposed to UTs. * Fix for mGPU. The stream should be selected after picking the current device according to input tensor.

Enable fp8 inductor unit tests (ROCm#1421)

90df487

* Add formal FP8 check in common_cuda.py * Enable inductor/test_valid_cast * Support for test_eager_fallback * allow fnuz types on amax test * Finalize passing tests vs failing * Fix fnuz constants in _to_fp8_saturated

[HIP] Few more updates to the returned error string

a390471

skipIfRocm needs msg parameter

6be1d5d

[NO CP] Updated changes to skip few UTs

31b3681

Add new kernel config for AMD GPUs

cefda3a

Update gesvda USE_ROCM guards

8068d3d

Print consolidated log file for pytorch unit test automation scripts (R…

5187ca9

…OCm#1433) * Print consolidated log file for pytorch uts * Update run_entire_tests subprocess call as well * lint * Add ERROR string

Scale XBLOCK in triton reduction configs to avoid hitting max grid (R…

012c13b

…OCm#1434)

rocm6.3 related_commits

6e45ab1

caching test_times

3aa060d

Sync updates from hipify_torch. (ROCm#1168)

0c5d257

Co-authored-by: Jithun Nair <[email protected]>

fix install_centos() function

ecf4e8d

Merge pull request ROCm#1436 from ROCm/IFU_CP_06172024

8f19207

IFU for rocm6.3_internal_testing

Update apex commit to pick up wheel-related changes (ROCm#1443)

5de711c

increase tensor size to force out of memory exception on MI300X (ROCm…

4459b67

…#1449)

Update clock info metric AMDSMI (ROCm#1459)

dd43b9b

CK GEMM Backend

1b6b84e

alugorey requested a review from jeffdaily July 18, 2024 19:01

alugorey marked this pull request as draft July 18, 2024 21:01

pruthvistony force-pushed the rocm6.3_internal_testing branch 2 times, most recently from 9ae24a7 to 12b4a67 Compare August 12, 2024 05:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CK GEMM Backend #1480

CK GEMM Backend #1480

alugorey commented Jul 18, 2024

rocm-mici commented Oct 9, 2024

rocm-mici commented Oct 9, 2024

rocm-repo-management-api bot commented Dec 11, 2024 •

edited

Loading

CK GEMM Backend #1480

Are you sure you want to change the base?

CK GEMM Backend #1480

Conversation

alugorey commented Jul 18, 2024

rocm-mici commented Oct 9, 2024

rocm-mici commented Oct 9, 2024

rocm-repo-management-api bot commented Dec 11, 2024 • edited Loading

rocm-repo-management-api bot commented Dec 11, 2024 •

edited

Loading