Develop upstream sync 20250113 #2802

alekstheod · 2025-01-13T09:38:32Z

Weekly sync 13/01/2025

Unmerged paths:
  (use "git add <file>..." to mark resolution)
        both modified:   .bazelrc
        both modified:   tensorflow/core/common_runtime/gpu/gpu_device_test.cc
        both modified:   tensorflow/core/kernels/matmul_op_fused.cc
        both modified:   tensorflow/core/kernels/matmul_op_impl.h
        both modified:   tensorflow/core/kernels/matmul_util.cc
        both modified:   tensorflow/core/kernels/matmul_util.h
        both modified:   third_party/gpus/rocm_configure.bzl
        both modified:   third_party/xla/third_party/tsl/third_party/gpus/rocm_configure.bzl
        both modified:   third_party/xla/xla/service/gpu/fusions/triton/dot_algorithms_test.cc
        both modified:   third_party/xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc
        both modified:   third_party/xla/xla/tests/BUILD

Imported from GitHub PR openxla/xla#19649 This change has as a goal to introduce an external dependency to the rocm library and tools. Building xla with the hermetic rocm is done by using these env variables: --repo_env=OS=ubuntu_20.04 --repo_env=ROCM_VERSION=6.2.0 To use only hermetic libs define this flag: --@local_config_rocm//rocm:use_rocm_hermetic_rpath=True This flag will make rpaths and configs to look inside the sandbox If flag is not set then default installation paths are used e.g /opt/rocm One has to provie OS version and ROCm version to initialize a proper rocm repository. If these flags are not set then default ROCm installation will be used to build XLA. depends-on: openxla/xla#19691 Copybara import of the project: -- cf744eca78f697144e122c6a9d1aa8fc52722b20 by Alexandros Theodoridis <[email protected]>: Implement hermetic rocm dependency -- 4f4ad859ec3143fdb04f7792541c61b98c708397 by Alexandros Theodoridis <[email protected]>: Add missing dependency -- 8e164f765b45b5e5d118b02695fd6d6e2b0b232d by Alexandros Theodoridis <[email protected]>: Add missing dependency and remove so files from data -- 35538f4922b5b28b9debd0ce17bb15b83b5921fc by Alexandros Theodoridis <[email protected]>: Rename setting to use_rocm_hermetic_rpath -- 58d140220e9e58572c9a7ae3de2ec1ea189566d3 by Alexandros Theodoridis <[email protected]>: Fix build for cuda and cpu Merging this change closes tensorflow#19649 PiperOrigin-RevId: 713248195

PiperOrigin-RevId: 713248622

`std::mismatch` should be called with an end iterator as the second argument if there is no guarantee on element count in the second range. PiperOrigin-RevId: 713264159

Move comparison of executable != nullptr _before_ calling std::move(executable). This is really only used for logging, but definitely adds confusion to the logs when it's always 0 :). PiperOrigin-RevId: 713272260

…andle and SupportsSendRecvCallbacks PiperOrigin-RevId: 713276521

…Usage It was always set to false by the callers. PiperOrigin-RevId: 713277020

….cseConstants=false` to avoid constant folding and CSE which is expensive. PiperOrigin-RevId: 713277781

…lso be used in vectorizing AtomicRMW in follow-up changes. PiperOrigin-RevId: 713281944

PiperOrigin-RevId: 713282226

The algorithm was checking whether to write to the output or not by comparing the current slice index with the number of indices per warp. It works only when we have perfectly tiled indices, e.g. 50 indices per warp with a total of 2000 indices. As soon as we have 2001 indices, the last warp processes 1 update slice, but never writes it down. Also simplified the logic for the update loop that accumulates elements in registers. Instead of having scf.if inside of xla.loop, now we have two different xla.loops in different cases of scf.if, that either overwrite the accumulator or combine it with the new data. PiperOrigin-RevId: 713296321

… when adding vectorization for AtomicRMW which will only be available for Hopper. PiperOrigin-RevId: 713297711

…tives APIs to acquire communicator in CollectiveThunk Implement Cliques support for XLA:CPU collectives for consistency with XLA:GPU. Further unification will be in followup CLs. PiperOrigin-RevId: 713305764

In preparation for larger changes, this entry point is being disabled here for now. PiperOrigin-RevId: 713316210

PiperOrigin-RevId: 713318085

Changed to use signature_key for the Run() method for input / output maps since it aligns with other parameters. PiperOrigin-RevId: 713323123

…ing path. PiperOrigin-RevId: 713323821

PiperOrigin-RevId: 713330122

…when one CollectivePermute (cp) depends on the other. When we insert control dependency from send-start of one cp to recv-start of another, we need to make sure that the cps are in post order. PiperOrigin-RevId: 713336414

PiperOrigin-RevId: 713346163

…sizes. PiperOrigin-RevId: 713353992

PiperOrigin-RevId: 7133574

…ll-to-all operation. The following example shows the detailed method. ``` base_shape: (32,32,32,32) mesh: a=2, b=4 old sharding: P('a', 'b', None, None), local shape (16,8,32,32) new sharding: P(None, None, 'a', 'b'), local shape (32,32,16,8) // Step 1. Merge sharding axes to a single dimension reshape (16,8,32,32) -> (16,8,2,16,4,8) transpose (16,8,2,16,4,8) -> (2,4,16,8,16,8) with permutation (2,4,0,1,3,5) reshape (2,4,16,8,16,8) -> (8,16,8,16,8) // Step 2. Apply the all-to-all all-to-all on (8,16,8,16,8) with split_dimension = 0 // Step 3. Split sharding axes to multiple dimensions reshape (8,16,8,16,8) -> (2,4,16,8,16,8) transpose (2,4,16,8,16,8) -> (2,16,4,8,16,8) with permutation (0,2,1,3,4,5) reshape (2,16,4,8,16,8) -> (32,32,16,8) ``` PiperOrigin-RevId: 713362037

PiperOrigin-RevId: 713372912

PiperOrigin-RevId: 713374730

…parameters. PiperOrigin-RevId: 713389880

PiperOrigin-RevId: 713394310

PiperOrigin-RevId: 713395731

This CL takes care of 1. Migrating the targets ``` tensorflow/compiler/xla:test tensorflow/compiler/xla:test_helpers tensorflow/compiler/xla/service:pattern_matcher_gmock ``` to tensorflow/compiler/xla/hlo/testlib 2. Setting up build aliases in xla or xla/service/ ensuring external dependencies are still satisfied. Phase II will take care of migration of external projects dependencies PiperOrigin-RevId: 713400473

This change also drops the relevant C++ plumbing. PiperOrigin-RevId: 713412874

…t a workaround. PiperOrigin-RevId: 713417811

alekstheod · 2025-01-16T11:27:59Z

Nightly build failing tests list for openxla:

[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 992.6s, min = 990.1s, avg = 991.2s, dev = 1.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/dot_algorithms_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/dot_algorithms_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/dot_algorithms_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:fusion_emitter_device_legacy_test_gpu_amd_any FAILED in 3 out of 3 in 121.3s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 121.3s, min = 120.7s, avg = 121.1s, dev = 0.2s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_device_legacy_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_device_legacy_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_device_legacy_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:fusion_emitter_int4_device_test_gpu_amd_any FAILED in 3 out of 3 in 111.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 111.8s, min = 111.4s, avg = 111.6s, dev = 0.2s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_int4_device_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_int4_device_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_int4_device_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:fusion_emitter_parametrized_test_gpu_amd_any FAILED in 3 out of 3 in 154.4s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 154.4s, min = 153.8s, avg = 154.1s, dev = 0.2s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_parametrized_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_parametrized_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_parametrized_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:support_legacy_test_gpu_amd_any        FAILED in 3 out of 3 in 17.5s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 17.5s, min = 17.4s, avg = 17.4s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_legacy_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_legacy_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_legacy_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:support_test                           FAILED in 3 out of 3 in 18.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 18.8s, min = 7.7s, avg = 12.2s, dev = 4.7s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_test/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_test/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_test/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/service/gpu/tests:gpu_kernel_tiling_test_gpu_amd_any               FAILED in 3 out of 3 in 13.1s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 13.1s, min = 12.8s, avg = 12.9s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_kernel_tiling_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_kernel_tiling_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_kernel_tiling_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/service/gpu/tests:gpu_triton_custom_call_test_gpu_amd_any          FAILED in 3 out of 3 in 7.7s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 7.7s, min = 7.7s, avg = 7.7s, dev = 0.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_triton_custom_call_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_triton_custom_call_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_triton_custom_call_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/service/gpu/transforms:dot_dimension_sorter_test_gpu_amd_any       FAILED in 3 out of 3 in 10.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 10.8s, min = 10.7s, avg = 10.7s, dev = 0.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/transforms/dot_dimension_sorter_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/transforms/dot_dimension_sorter_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/transforms/dot_dimension_sorter_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/stream_executor/rocm:rocm_stream_test_gpu_amd_any                  FAILED in 3 out of 3 in 7.0s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 7.0s, min = 6.9s, avg = 7.0s, dev = 0.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/stream_executor/rocm/rocm_stream_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/stream_executor/rocm/rocm_stream_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/stream_executor/rocm/rocm_stream_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:all_reduce_test_gpu_amd_any                                  FAILED in 3 out of 3 in 9.7s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 9.7s, min = 9.6s, avg = 9.6s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/all_reduce_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/all_reduce_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/all_reduce_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:broadcast_test_gpu_amd_any                                   FAILED in 3 out of 3 in 11.9s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 11.9s, min = 9.6s, avg = 11.1s, dev = 1.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/broadcast_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/broadcast_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/broadcast_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:conv_depthwise_backprop_filter_test_gpu_amd_any              FAILED in 3 out of 3 in 92.1s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 92.1s, min = 22.6s, avg = 46.9s, dev = 31.9s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/conv_depthwise_backprop_filter_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/conv_depthwise_backprop_filter_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/conv_depthwise_backprop_filter_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:copy_test_gpu_amd_any                                        FAILED in 3 out of 3 in 11.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 11.8s, min = 9.7s, avg = 10.4s, dev = 1.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/copy_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/copy_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/copy_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:gather_operation_test_gpu_amd_any                            FAILED in 3 out of 3 in 9.9s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 9.9s, min = 9.7s, avg = 9.8s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/gather_operation_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/gather_operation_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/gather_operation_test_gpu_amd_any/test_attempts/attempt_2.log

Vs failing xla tests in this PR:

[2025-01-16T10:52:59.222Z] @local_xla//xla/service/gpu/fusions/triton:triton_fusion_emitter_int4_device_test_gpu_amd_any FAILED in 139.6s
[2025-01-16T10:52:59.222Z]   /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/external/local_xla/xla/service/gpu/fusions/triton/triton_fusion_emitter_int4_device_test_gpu_amd_any/test.log
[2025-01-16T10:52:59.222Z] @local_xla//xla/stream_executor/gpu:gpu_test_kernels_fatbin_test_gpu_amd_any FAILED in 8.0s
[2025-01-16T10:52:59.222Z]   /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/external/local_xla/xla/stream_executor/gpu/gpu_test_kernels_fatbin_test_gpu_amd_any/test.log
[2025-01-16T10:52:59.222Z]

alekstheod · 2025-01-16T13:44:31Z

We decided to skip the int4 triton tests and investigate the issue separate from the weekly sync.

i-chaochen · 2025-01-20T09:58:31Z

tensorflow/core/kernels/BUILD

@@ -3924,6 +3924,7 @@ tf_cuda_cc_test(
    srcs = ["matmul_op_test.cc"],
    tags = [
        "no_aarch64",  # b/282068262
+        "cuda-only", # weekly sync 20250113


may I ask why we skip this matmul_op_test ? is it subtest failed or do we know the root cause?

It fails with this error:

2025-01-17 11:20:21.004074: W tensorflow/core/framework/op_kernel.cc:1857] OP_REQUIRES failed at matmul_op_fused.cc:582 : INTERNAL: Unsupported fusion for BlasLt Matmul 2025-01-17 11:20:21.004123: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: Unsupported fusion for BlasLt Matmul [[{{node fused_matmul}}]] 2025-01-17 11:20:21.004146: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: Unsupported fusion for BlasLt Matmul [[{{node fused_matmul}}]] [[fused_matmul/_1]] 2025-01-17 11:20:21.004166: I tensorflow/core/framework/local_rendezvous.cc:426] Local rendezvous recv item cancelled. Key hash: 6312669201998896265 tensorflow/core/kernels/matmul_op_test.cc:223: Failure Value of: (last_status) Expected: is OK Actual: INTERNAL: 2 root error(s) found. (0) INTERNAL: Unsupported fusion for BlasLt Matmul [[{{node fused_matmul}}]] [[fused_matmul/_1]] (1) INTERNAL: Unsupported fusion for BlasLt Matmul [[{{node fused_matmul}}]] 0 successful operations. 0 derived errors ignored. (of type absl::lts_20230802::Status) tensorflow/core/kernels/matmul_op_test.cc:254: Failure Expected equality of these values: matmul.dtype() Which is: 19 fused_matmul.dtype() Which is: 1

I created an issue to enable it back.
https://github.com/ROCm/frameworks-internal/issues/10718
I couldn't find the root cause by looking into the code directly. I guess we have to analyze it separately.
There are several tests failing with the same error. They are not failing on mi100, but do on mi210 and later.

Full log is here. I will try to disable the individual tests.
test.log

There is something in common in failing tests. All of them are those with a suffix WithActiovation and all are for type Eigen::half:
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul256x128x64WithActivation, where TypeParam = Eigen::half
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul1x256x256WithActivation, where TypeParam = Eigen::half
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul256x256x1WithActivation, where TypeParam = Eigen::half
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul1x256x1WithActivation, where TypeParam = Eigen::half

I will try to disable only those tests.

i-chaochen

LGTM

inemankov · 2025-01-21T12:47:13Z

!gen-cache

okakarpa · 2025-01-21T12:48:05Z

The disk cache generation for the cpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-nonpip-multi tests status: successfully finished

inemankov · 2025-01-21T16:16:42Z

!gen-cache

okakarpa · 2025-01-21T16:17:05Z

The disk cache generation for the cpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-nonpip-multi tests status: successfully finished

The disk cache generation for the XLA tests status: scheduled

inemankov · 2025-01-21T17:59:25Z

!gen-cache

okakarpa · 2025-01-21T18:00:05Z

The disk cache generation for the cpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-pycpp tests status: successfully finished
The disk cache generation for the gpu-nonpip-multi tests status: successfully finished

The disk cache generation for the XLA tests status: successfully finished

alekstheod and others added 30 commits January 8, 2025 05:01

Automated Code Change

b7bed6c

PiperOrigin-RevId: 713248622

Fix undefined behavior of mismatch in coordination service.

11b202f

`std::mismatch` should be called with an end iterator as the second argument if there is no guarantee on element count in the second range. PiperOrigin-RevId: 713264159

[xla:gpu] fix bug in counting good autotuner configs

a70dab5

Move comparison of executable != nullptr _before_ calling std::move(executable). This is really only used for logging, but definitely adds confusion to the logs when it's always 0 :). PiperOrigin-RevId: 713272260

[pjrt] Removed unused CreateDeviceToHostChannelHandle, CreateChannelH…

affe2e7

…andle and SupportsSendRecvCallbacks PiperOrigin-RevId: 713276521

[pjrt] Removed unused prefer_to_retain_reference argument from Record…

039586b

…Usage It was always set to false by the callers. PiperOrigin-RevId: 713277020

#sdy use applyPatternsGreedily with config.fold=false and `config…

5ab13f8

….cseConstants=false` to avoid constant folding and CSE which is expensive. PiperOrigin-RevId: 713277781

Moving AtomicRMW utilities out of lower_tensors. These are going to a…

b7418dd

…lso be used in vectorizing AtomicRMW in follow-up changes. PiperOrigin-RevId: 713281944

[XLA:CPU] Remove no thunks tests for exhaustive_binary_test

4c829e6

PiperOrigin-RevId: 713282226

Passing device information to Vectorization pass. This will be needed…

9314252

… when adding vectorization for AtomicRMW which will only be available for Hopper. PiperOrigin-RevId: 713297711

[xla:cpu] Add CpuClique to XLA:CPU collectives and use generic collec…

a001136

…tives APIs to acquire communicator in CollectiveThunk Implement Cliques support for XLA:CPU collectives for consistency with XLA:GPU. Further unification will be in followup CLs. PiperOrigin-RevId: 713305764

Remove experimental TOSA convert python API

95def86

In preparation for larger changes, this entry point is being disabled here for now. PiperOrigin-RevId: 713316210

[XLA:GPU][Emitters] Fix a typo in vectorize_loads_stores.mlir

58dabf1

PiperOrigin-RevId: 713318085

Update CompiledModel.Run()

f7c7dd2

Changed to use signature_key for the Run() method for input / output maps since it aligns with other parameters. PiperOrigin-RevId: 713323123

IFRT proxy asan fix: Do not call promise.Set() twice in error-handl…

6fefafc

…ing path. PiperOrigin-RevId: 713323821

Remove obsolete target.

0c60260

PiperOrigin-RevId: 713330122

Move most of kernel Launch processing from Stream to the Kernel classes.

9f293a1

PiperOrigin-RevId: 713346163

Make MatchShapeCoveringDynamicIndexInstruction handle non-unit slice …

a7703e7

…sizes. PiperOrigin-RevId: 713353992

#tf-data-service Remove obsolete todo.

dea544a

PiperOrigin-RevId: 7133574

#tf-data-service Remove obsolete todo.

82600bb

PiperOrigin-RevId: 713372912

#tf-data Remove obsolete todo.

f0a6d12

PiperOrigin-RevId: 713374730

Add a HLOPrintOption to control printing of the parameter number for …

70be419

…parameters. PiperOrigin-RevId: 713389880

Adds CreateFromAhwb method

d1047c5

PiperOrigin-RevId: 713394310

#tf-data Remove obsolete todo.

f3d8658

PiperOrigin-RevId: 713395731

[xla:python] Removed unused *Executable.compile_options

863d86b

This change also drops the relevant C++ plumbing. PiperOrigin-RevId: 713412874

#tf-data For an empty from_list, update the error message to sugges…

805779b

…t a workaround. PiperOrigin-RevId: 713417811

alekstheod force-pushed the develop-upstream-sync-20250113 branch 3 times, most recently from afe115b to 0e026ce Compare January 15, 2025 13:42

Fix conflicts

e1d9704

alekstheod force-pushed the develop-upstream-sync-20250113 branch from 0e026ce to e1d9704 Compare January 15, 2025 14:16

Fix lower tensors alloc issue tensorflow#10233

b3932df

Fix triton tests

a5407d3

alekstheod force-pushed the develop-upstream-sync-20250113 branch from d123032 to a5407d3 Compare January 16, 2025 15:21

alekstheod added 2 commits January 16, 2025 17:27

Fix fabin tests

288618c

Disable matmul failing test

797657a

alekstheod requested review from draganmladjenovic and i-chaochen January 20, 2025 08:56

Fix todo comment

467f778

i-chaochen requested a review from pemeliya January 20, 2025 09:56

i-chaochen reviewed Jan 20, 2025

View reviewed changes

Narrow disabled matmul_op_tests

0e049c0

i-chaochen approved these changes Jan 20, 2025

View reviewed changes

alekstheod merged commit aeabf47 into develop-upstream Jan 20, 2025
5 checks passed

ROCm deleted a comment from okakarpa Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Develop upstream sync 20250113 #2802

Develop upstream sync 20250113 #2802

Uh oh!

alekstheod commented Jan 13, 2025 •

edited

Loading

Uh oh!

alekstheod commented Jan 16, 2025 •

edited

Loading

Uh oh!

alekstheod commented Jan 16, 2025

Uh oh!

i-chaochen Jan 20, 2025

Uh oh!

alekstheod Jan 20, 2025

Uh oh!

alekstheod Jan 20, 2025

Uh oh!

alekstheod Jan 20, 2025

Uh oh!

i-chaochen left a comment

Uh oh!

Uh oh!

inemankov commented Jan 21, 2025

Uh oh!

okakarpa commented Jan 21, 2025 •

edited

Loading

Uh oh!

inemankov commented Jan 21, 2025

Uh oh!

okakarpa commented Jan 21, 2025 •

edited

Loading

Uh oh!

inemankov commented Jan 21, 2025

Uh oh!

okakarpa commented Jan 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Develop upstream sync 20250113 #2802

Develop upstream sync 20250113 #2802

Uh oh!

Conversation

alekstheod commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alekstheod commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alekstheod commented Jan 16, 2025

Uh oh!

i-chaochen Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

alekstheod Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

alekstheod Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

alekstheod Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

i-chaochen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

inemankov commented Jan 21, 2025

Uh oh!

okakarpa commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

inemankov commented Jan 21, 2025

Uh oh!

okakarpa commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

inemankov commented Jan 21, 2025

Uh oh!

okakarpa commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alekstheod commented Jan 13, 2025 •

edited

Loading

alekstheod commented Jan 16, 2025 •

edited

Loading

okakarpa commented Jan 21, 2025 •

edited

Loading

okakarpa commented Jan 21, 2025 •

edited

Loading

okakarpa commented Jan 21, 2025 •

edited

Loading