Skip to content

Develop upstream sync 20250113 #2802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1,288 commits into from
Jan 20, 2025
Merged

Conversation

alekstheod
Copy link

@alekstheod alekstheod commented Jan 13, 2025

Weekly sync 13/01/2025

Unmerged paths:
  (use "git add <file>..." to mark resolution)
        both modified:   .bazelrc
        both modified:   tensorflow/core/common_runtime/gpu/gpu_device_test.cc
        both modified:   tensorflow/core/kernels/matmul_op_fused.cc
        both modified:   tensorflow/core/kernels/matmul_op_impl.h
        both modified:   tensorflow/core/kernels/matmul_util.cc
        both modified:   tensorflow/core/kernels/matmul_util.h
        both modified:   third_party/gpus/rocm_configure.bzl
        both modified:   third_party/xla/third_party/tsl/third_party/gpus/rocm_configure.bzl
        both modified:   third_party/xla/xla/service/gpu/fusions/triton/dot_algorithms_test.cc
        both modified:   third_party/xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc
        both modified:   third_party/xla/xla/tests/BUILD

alekstheod and others added 30 commits January 8, 2025 05:01
Imported from GitHub PR openxla/xla#19649

This change has as a goal to introduce an external dependency to the rocm library and tools.

Building xla with the hermetic rocm is done by using these env variables:

--repo_env=OS=ubuntu_20.04
--repo_env=ROCM_VERSION=6.2.0

To use only hermetic libs define this flag:
--@local_config_rocm//rocm:use_rocm_hermetic_rpath=True
This flag will make rpaths and configs to look inside the sandbox
If flag is not set then default installation paths are used e.g /opt/rocm

One has to provie OS version and ROCm version to initialize a proper rocm repository.
If these flags are not set then default ROCm installation will be used to build XLA.

depends-on: openxla/xla#19691

Copybara import of the project:

--
cf744eca78f697144e122c6a9d1aa8fc52722b20 by Alexandros Theodoridis <[email protected]>:

Implement hermetic rocm dependency

--
4f4ad859ec3143fdb04f7792541c61b98c708397 by Alexandros Theodoridis <[email protected]>:

Add missing dependency

--
8e164f765b45b5e5d118b02695fd6d6e2b0b232d by Alexandros Theodoridis <[email protected]>:

Add missing dependency and remove so files from data

--
35538f4922b5b28b9debd0ce17bb15b83b5921fc by Alexandros Theodoridis <[email protected]>:

Rename setting to use_rocm_hermetic_rpath

--
58d140220e9e58572c9a7ae3de2ec1ea189566d3 by Alexandros Theodoridis <[email protected]>:

Fix build for cuda and cpu

Merging this change closes tensorflow#19649

PiperOrigin-RevId: 713248195
PiperOrigin-RevId: 713248622
`std::mismatch` should be called with an end iterator as the second argument if there is no guarantee on element count in the second range.

PiperOrigin-RevId: 713264159
Move comparison of executable != nullptr _before_ calling std::move(executable).

This is really only used for logging, but definitely adds confusion to the logs when it's always 0 :).

PiperOrigin-RevId: 713272260
…andle and SupportsSendRecvCallbacks

PiperOrigin-RevId: 713276521
…Usage

It was always set to false by the callers.

PiperOrigin-RevId: 713277020
….cseConstants=false` to avoid constant folding and CSE which is expensive.

PiperOrigin-RevId: 713277781
…lso be used in vectorizing AtomicRMW in follow-up changes.

PiperOrigin-RevId: 713281944
The algorithm was checking whether to write to the output or not by comparing the current slice index with the number of indices per warp. It works only when we have perfectly tiled indices, e.g. 50 indices per warp with a total of 2000 indices. As soon as we have 2001 indices, the last warp processes 1 update slice, but never writes it down.

Also simplified the logic for the update loop that accumulates elements in registers. Instead of having scf.if inside of xla.loop, now we have two different xla.loops in different cases of scf.if, that either overwrite the accumulator or combine it with the new data.

PiperOrigin-RevId: 713296321
… when adding vectorization for AtomicRMW which will only be available for Hopper.

PiperOrigin-RevId: 713297711
…tives APIs to acquire communicator in CollectiveThunk

Implement Cliques support for XLA:CPU collectives for consistency with XLA:GPU. Further unification will be in followup CLs.

PiperOrigin-RevId: 713305764
In preparation for larger changes, this entry point is being disabled here for now.

PiperOrigin-RevId: 713316210
Changed to use signature_key for the Run() method for input / output maps
since it aligns with other parameters.

PiperOrigin-RevId: 713323123
PiperOrigin-RevId: 713330122
…when one CollectivePermute (cp) depends on the other. When we insert control dependency from send-start of one cp to recv-start of another, we need to make sure that the cps are in post order.

PiperOrigin-RevId: 713336414
…ll-to-all operation.

The following example shows the detailed method.
```
base_shape: (32,32,32,32)
mesh: a=2, b=4
old sharding: P('a', 'b', None, None), local shape (16,8,32,32)
new sharding: P(None, None, 'a', 'b'), local shape (32,32,16,8)

// Step 1. Merge sharding axes to a single dimension
reshape (16,8,32,32) -> (16,8,2,16,4,8)
transpose (16,8,2,16,4,8) -> (2,4,16,8,16,8) with permutation (2,4,0,1,3,5)
reshape (2,4,16,8,16,8) -> (8,16,8,16,8)

// Step 2. Apply the all-to-all
all-to-all on (8,16,8,16,8) with split_dimension = 0

// Step 3. Split sharding axes to multiple dimensions
reshape (8,16,8,16,8) -> (2,4,16,8,16,8)
transpose (2,4,16,8,16,8) -> (2,16,4,8,16,8) with permutation (0,2,1,3,4,5)
reshape (2,16,4,8,16,8) -> (32,32,16,8)
```

PiperOrigin-RevId: 713362037
PiperOrigin-RevId: 713372912
PiperOrigin-RevId: 713374730
PiperOrigin-RevId: 713394310
PiperOrigin-RevId: 713395731
This CL takes care of

1. Migrating the targets
```
tensorflow/compiler/xla:test
tensorflow/compiler/xla:test_helpers
tensorflow/compiler/xla/service:pattern_matcher_gmock
```

to tensorflow/compiler/xla/hlo/testlib

2. Setting up build aliases in xla or xla/service/ ensuring external
dependencies are still satisfied.

Phase II will take care of migration of external projects dependencies

PiperOrigin-RevId: 713400473
This change also drops the relevant C++ plumbing.

PiperOrigin-RevId: 713412874
…t a workaround.

PiperOrigin-RevId: 713417811
@alekstheod alekstheod force-pushed the develop-upstream-sync-20250113 branch 3 times, most recently from afe115b to 0e026ce Compare January 15, 2025 13:42
@alekstheod alekstheod force-pushed the develop-upstream-sync-20250113 branch from 0e026ce to e1d9704 Compare January 15, 2025 14:16
@alekstheod
Copy link
Author

alekstheod commented Jan 16, 2025

Nightly build failing tests list for openxla:

[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 992.6s, min = 990.1s, avg = 991.2s, dev = 1.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/dot_algorithms_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/dot_algorithms_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/dot_algorithms_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:fusion_emitter_device_legacy_test_gpu_amd_any FAILED in 3 out of 3 in 121.3s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 121.3s, min = 120.7s, avg = 121.1s, dev = 0.2s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_device_legacy_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_device_legacy_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_device_legacy_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:fusion_emitter_int4_device_test_gpu_amd_any FAILED in 3 out of 3 in 111.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 111.8s, min = 111.4s, avg = 111.6s, dev = 0.2s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_int4_device_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_int4_device_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_int4_device_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:fusion_emitter_parametrized_test_gpu_amd_any FAILED in 3 out of 3 in 154.4s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 154.4s, min = 153.8s, avg = 154.1s, dev = 0.2s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_parametrized_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_parametrized_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_parametrized_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:support_legacy_test_gpu_amd_any        FAILED in 3 out of 3 in 17.5s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 17.5s, min = 17.4s, avg = 17.4s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_legacy_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_legacy_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_legacy_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:support_test                           FAILED in 3 out of 3 in 18.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 18.8s, min = 7.7s, avg = 12.2s, dev = 4.7s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_test/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_test/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_test/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/service/gpu/tests:gpu_kernel_tiling_test_gpu_amd_any               FAILED in 3 out of 3 in 13.1s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 13.1s, min = 12.8s, avg = 12.9s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_kernel_tiling_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_kernel_tiling_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_kernel_tiling_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/service/gpu/tests:gpu_triton_custom_call_test_gpu_amd_any          FAILED in 3 out of 3 in 7.7s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 7.7s, min = 7.7s, avg = 7.7s, dev = 0.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_triton_custom_call_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_triton_custom_call_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_triton_custom_call_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/service/gpu/transforms:dot_dimension_sorter_test_gpu_amd_any       FAILED in 3 out of 3 in 10.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 10.8s, min = 10.7s, avg = 10.7s, dev = 0.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/transforms/dot_dimension_sorter_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/transforms/dot_dimension_sorter_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/transforms/dot_dimension_sorter_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/stream_executor/rocm:rocm_stream_test_gpu_amd_any                  FAILED in 3 out of 3 in 7.0s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 7.0s, min = 6.9s, avg = 7.0s, dev = 0.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/stream_executor/rocm/rocm_stream_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/stream_executor/rocm/rocm_stream_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/stream_executor/rocm/rocm_stream_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:all_reduce_test_gpu_amd_any                                  FAILED in 3 out of 3 in 9.7s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 9.7s, min = 9.6s, avg = 9.6s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/all_reduce_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/all_reduce_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/all_reduce_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:broadcast_test_gpu_amd_any                                   FAILED in 3 out of 3 in 11.9s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 11.9s, min = 9.6s, avg = 11.1s, dev = 1.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/broadcast_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/broadcast_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/broadcast_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:conv_depthwise_backprop_filter_test_gpu_amd_any              FAILED in 3 out of 3 in 92.1s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 92.1s, min = 22.6s, avg = 46.9s, dev = 31.9s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/conv_depthwise_backprop_filter_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/conv_depthwise_backprop_filter_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/conv_depthwise_backprop_filter_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:copy_test_gpu_amd_any                                        FAILED in 3 out of 3 in 11.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 11.8s, min = 9.7s, avg = 10.4s, dev = 1.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/copy_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/copy_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/copy_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:gather_operation_test_gpu_amd_any                            FAILED in 3 out of 3 in 9.9s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 9.9s, min = 9.7s, avg = 9.8s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/gather_operation_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/gather_operation_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/gather_operation_test_gpu_amd_any/test_attempts/attempt_2.log

Vs failing xla tests in this PR:

[2025-01-16T10:52:59.222Z] @local_xla//xla/service/gpu/fusions/triton:triton_fusion_emitter_int4_device_test_gpu_amd_any FAILED in 139.6s
[2025-01-16T10:52:59.222Z]   /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/external/local_xla/xla/service/gpu/fusions/triton/triton_fusion_emitter_int4_device_test_gpu_amd_any/test.log
[2025-01-16T10:52:59.222Z] @local_xla//xla/stream_executor/gpu:gpu_test_kernels_fatbin_test_gpu_amd_any FAILED in 8.0s
[2025-01-16T10:52:59.222Z]   /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/external/local_xla/xla/stream_executor/gpu/gpu_test_kernels_fatbin_test_gpu_amd_any/test.log
[2025-01-16T10:52:59.222Z] 

@alekstheod
Copy link
Author

We decided to skip the int4 triton tests and investigate the issue separate from the weekly sync.

@alekstheod alekstheod force-pushed the develop-upstream-sync-20250113 branch from d123032 to a5407d3 Compare January 16, 2025 15:21
@i-chaochen i-chaochen requested a review from pemeliya January 20, 2025 09:56
@@ -3924,6 +3924,7 @@ tf_cuda_cc_test(
srcs = ["matmul_op_test.cc"],
tags = [
"no_aarch64", # b/282068262
"cuda-only", # weekly sync 20250113
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may I ask why we skip this matmul_op_test ? is it subtest failed or do we know the root cause?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails with this error:

2025-01-17 11:20:21.004074: W tensorflow/core/framework/op_kernel.cc:1857] OP_REQUIRES failed at matmul_op_fused.cc:582 : INTERNAL: Unsupported fusion for BlasLt Matmul
2025-01-17 11:20:21.004123: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: Unsupported fusion for BlasLt Matmul
   [[{{node fused_matmul}}]]
2025-01-17 11:20:21.004146: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: Unsupported fusion for BlasLt Matmul
   [[{{node fused_matmul}}]]
   [[fused_matmul/_1]]
2025-01-17 11:20:21.004166: I tensorflow/core/framework/local_rendezvous.cc:426] Local rendezvous recv item cancelled. Key hash: 6312669201998896265
tensorflow/core/kernels/matmul_op_test.cc:223: Failure
Value of: (last_status)
Expected: is OK
  Actual: INTERNAL: 2 root error(s) found.
  (0) INTERNAL: Unsupported fusion for BlasLt Matmul
   [[{{node fused_matmul}}]]
   [[fused_matmul/_1]]
  (1) INTERNAL: Unsupported fusion for BlasLt Matmul
   [[{{node fused_matmul}}]]
0 successful operations.
0 derived errors ignored. (of type absl::lts_20230802::Status)
tensorflow/core/kernels/matmul_op_test.cc:254: Failure
Expected equality of these values:
  matmul.dtype()
    Which is: 19
  fused_matmul.dtype()
    Which is: 1

I created an issue to enable it back.
https://github.com/ROCm/frameworks-internal/issues/10718
I couldn't find the root cause by looking into the code directly. I guess we have to analyze it separately.
There are several tests failing with the same error. They are not failing on mi100, but do on mi210 and later.

Full log is here. I will try to disable the individual tests.
test.log

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something in common in failing tests. All of them are those with a suffix WithActiovation and all are for type Eigen::half:
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul256x128x64WithActivation, where TypeParam = Eigen::half
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul1x256x256WithActivation, where TypeParam = Eigen::half
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul256x256x1WithActivation, where TypeParam = Eigen::half
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul1x256x1WithActivation, where TypeParam = Eigen::half

I will try to disable only those tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@i-chaochen i-chaochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alekstheod alekstheod merged commit aeabf47 into develop-upstream Jan 20, 2025
5 checks passed
@ROCm ROCm deleted a comment from okakarpa Jan 21, 2025
@ROCm ROCm deleted a comment from okakarpa Jan 21, 2025
@ROCm ROCm deleted a comment from okakarpa Jan 21, 2025
@inemankov
Copy link

!gen-cache

@okakarpa
Copy link
Collaborator

okakarpa commented Jan 21, 2025

The disk cache generation for the cpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-nonpip-multi tests status: successfully finished

@ROCm ROCm deleted a comment from okakarpa Jan 21, 2025
@inemankov
Copy link

!gen-cache

@okakarpa
Copy link
Collaborator

okakarpa commented Jan 21, 2025

The disk cache generation for the cpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-nonpip-multi tests status: successfully finished

The disk cache generation for the XLA tests status: scheduled

@inemankov
Copy link

!gen-cache

@okakarpa
Copy link
Collaborator

okakarpa commented Jan 21, 2025

The disk cache generation for the cpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-nonpip-multi tests status: successfully finished

The disk cache generation for the XLA tests status: successfully finished

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.