Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop upstream sync 20250113 #2802

Open
wants to merge 1,286 commits into
base: develop-upstream
Choose a base branch
from

Conversation

alekstheod
Copy link

@alekstheod alekstheod commented Jan 13, 2025

Weekly sync 13/01/2025

Unmerged paths:
  (use "git add <file>..." to mark resolution)
        both modified:   .bazelrc
        both modified:   tensorflow/core/common_runtime/gpu/gpu_device_test.cc
        both modified:   tensorflow/core/kernels/matmul_op_fused.cc
        both modified:   tensorflow/core/kernels/matmul_op_impl.h
        both modified:   tensorflow/core/kernels/matmul_util.cc
        both modified:   tensorflow/core/kernels/matmul_util.h
        both modified:   third_party/gpus/rocm_configure.bzl
        both modified:   third_party/xla/third_party/tsl/third_party/gpus/rocm_configure.bzl
        both modified:   third_party/xla/xla/service/gpu/fusions/triton/dot_algorithms_test.cc
        both modified:   third_party/xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc
        both modified:   third_party/xla/xla/tests/BUILD

akuegel and others added 30 commits January 8, 2025 04:27
This was forgotten to be removed during an earlier refactoring.

PiperOrigin-RevId: 713237188
This method was renamed but staging function kept, switch to renamed variant.

PiperOrigin-RevId: 713242393
Imported from GitHub PR openxla/xla#19649

This change has as a goal to introduce an external dependency to the rocm library and tools.

Building xla with the hermetic rocm is done by using these env variables:

--repo_env=OS=ubuntu_20.04
--repo_env=ROCM_VERSION=6.2.0

To use only hermetic libs define this flag:
--@local_config_rocm//rocm:use_rocm_hermetic_rpath=True
This flag will make rpaths and configs to look inside the sandbox
If flag is not set then default installation paths are used e.g /opt/rocm

One has to provie OS version and ROCm version to initialize a proper rocm repository.
If these flags are not set then default ROCm installation will be used to build XLA.

depends-on: openxla/xla#19691

Copybara import of the project:

--
cf744eca78f697144e122c6a9d1aa8fc52722b20 by Alexandros Theodoridis <[email protected]>:

Implement hermetic rocm dependency

--
4f4ad859ec3143fdb04f7792541c61b98c708397 by Alexandros Theodoridis <[email protected]>:

Add missing dependency

--
8e164f765b45b5e5d118b02695fd6d6e2b0b232d by Alexandros Theodoridis <[email protected]>:

Add missing dependency and remove so files from data

--
35538f4922b5b28b9debd0ce17bb15b83b5921fc by Alexandros Theodoridis <[email protected]>:

Rename setting to use_rocm_hermetic_rpath

--
58d140220e9e58572c9a7ae3de2ec1ea189566d3 by Alexandros Theodoridis <[email protected]>:

Fix build for cuda and cpu

Merging this change closes tensorflow#19649

PiperOrigin-RevId: 713248195
PiperOrigin-RevId: 713248622
`std::mismatch` should be called with an end iterator as the second argument if there is no guarantee on element count in the second range.

PiperOrigin-RevId: 713264159
Move comparison of executable != nullptr _before_ calling std::move(executable).

This is really only used for logging, but definitely adds confusion to the logs when it's always 0 :).

PiperOrigin-RevId: 713272260
…andle and SupportsSendRecvCallbacks

PiperOrigin-RevId: 713276521
…Usage

It was always set to false by the callers.

PiperOrigin-RevId: 713277020
….cseConstants=false` to avoid constant folding and CSE which is expensive.

PiperOrigin-RevId: 713277781
…lso be used in vectorizing AtomicRMW in follow-up changes.

PiperOrigin-RevId: 713281944
The algorithm was checking whether to write to the output or not by comparing the current slice index with the number of indices per warp. It works only when we have perfectly tiled indices, e.g. 50 indices per warp with a total of 2000 indices. As soon as we have 2001 indices, the last warp processes 1 update slice, but never writes it down.

Also simplified the logic for the update loop that accumulates elements in registers. Instead of having scf.if inside of xla.loop, now we have two different xla.loops in different cases of scf.if, that either overwrite the accumulator or combine it with the new data.

PiperOrigin-RevId: 713296321
… when adding vectorization for AtomicRMW which will only be available for Hopper.

PiperOrigin-RevId: 713297711
…tives APIs to acquire communicator in CollectiveThunk

Implement Cliques support for XLA:CPU collectives for consistency with XLA:GPU. Further unification will be in followup CLs.

PiperOrigin-RevId: 713305764
In preparation for larger changes, this entry point is being disabled here for now.

PiperOrigin-RevId: 713316210
Changed to use signature_key for the Run() method for input / output maps
since it aligns with other parameters.

PiperOrigin-RevId: 713323123
PiperOrigin-RevId: 713330122
…when one CollectivePermute (cp) depends on the other. When we insert control dependency from send-start of one cp to recv-start of another, we need to make sure that the cps are in post order.

PiperOrigin-RevId: 713336414
…ll-to-all operation.

The following example shows the detailed method.
```
base_shape: (32,32,32,32)
mesh: a=2, b=4
old sharding: P('a', 'b', None, None), local shape (16,8,32,32)
new sharding: P(None, None, 'a', 'b'), local shape (32,32,16,8)

// Step 1. Merge sharding axes to a single dimension
reshape (16,8,32,32) -> (16,8,2,16,4,8)
transpose (16,8,2,16,4,8) -> (2,4,16,8,16,8) with permutation (2,4,0,1,3,5)
reshape (2,4,16,8,16,8) -> (8,16,8,16,8)

// Step 2. Apply the all-to-all
all-to-all on (8,16,8,16,8) with split_dimension = 0

// Step 3. Split sharding axes to multiple dimensions
reshape (8,16,8,16,8) -> (2,4,16,8,16,8)
transpose (2,4,16,8,16,8) -> (2,16,4,8,16,8) with permutation (0,2,1,3,4,5)
reshape (2,16,4,8,16,8) -> (32,32,16,8)
```

PiperOrigin-RevId: 713362037
PiperOrigin-RevId: 713372912
PiperOrigin-RevId: 713374730
PiperOrigin-RevId: 713394310
PiperOrigin-RevId: 713395731
This CL takes care of

1. Migrating the targets
```
tensorflow/compiler/xla:test
tensorflow/compiler/xla:test_helpers
tensorflow/compiler/xla/service:pattern_matcher_gmock
```

to tensorflow/compiler/xla/hlo/testlib

2. Setting up build aliases in xla or xla/service/ ensuring external
dependencies are still satisfied.

Phase II will take care of migration of external projects dependencies

PiperOrigin-RevId: 713400473
seherellis and others added 6 commits January 12, 2025 23:53
…ns in manual sharding group.

Imported from GitHub PR openxla/xla#20808

This is a small fix in GSPMD partitioning for partitioning collective permutes instructions added in manual sharding group.

In JAX, we can add `ppermute` instruction in shard_map. In cases where we have shard_map with auto axes specified, collective permuting an operand even with the same sharding will end up with an `all-gather` and then collective permute, which leads to inefficient collectives. The correct and efficient way is to partition the collective permute as an element-wise op.

The unit test added provides a repro. Also, the JAX unit test in https://github.com/jax-ml/jax/blob/fa9c7edf736516052df6eab22947bc627d0deca3/tests/shard_map_test.py#L2167 gives a real-world JAX example.
Copybara import of the project:

--
8ee6ecd51f6e4aae8e3d92a6a439a60f53ab02ae by Yunlong Liu <[email protected]>:

A hacky fix on partitioning collective permute.

--
e50e87696defb290f7561a7808ee42ebbc11e144 by Yunlong Liu <[email protected]>:

Local change.

--
84eb38597c783a4488774823c2c464296a8c54c7 by Yunlong Liu <[email protected]>:

Simplifies sharding in tests.

Merging this change closes tensorflow#20808

PiperOrigin-RevId: 714851861
generated .a library has a different name depending on cuda/rocm build.

PiperOrigin-RevId: 714853250
PiperOrigin-RevId: 714855571
PiperOrigin-RevId: 714863244
@alekstheod alekstheod force-pushed the develop-upstream-sync-20250113 branch 15 times, most recently from afe115b to 0e026ce Compare January 15, 2025 13:42
@alekstheod alekstheod force-pushed the develop-upstream-sync-20250113 branch from 0e026ce to e1d9704 Compare January 15, 2025 14:16
@alekstheod
Copy link
Author

alekstheod commented Jan 16, 2025

Nightly build failing tests list for openxla:

[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 992.6s, min = 990.1s, avg = 991.2s, dev = 1.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/dot_algorithms_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/dot_algorithms_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/dot_algorithms_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:fusion_emitter_device_legacy_test_gpu_amd_any FAILED in 3 out of 3 in 121.3s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 121.3s, min = 120.7s, avg = 121.1s, dev = 0.2s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_device_legacy_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_device_legacy_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_device_legacy_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:fusion_emitter_int4_device_test_gpu_amd_any FAILED in 3 out of 3 in 111.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 111.8s, min = 111.4s, avg = 111.6s, dev = 0.2s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_int4_device_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_int4_device_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_int4_device_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:fusion_emitter_parametrized_test_gpu_amd_any FAILED in 3 out of 3 in 154.4s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 154.4s, min = 153.8s, avg = 154.1s, dev = 0.2s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_parametrized_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_parametrized_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/fusion_emitter_parametrized_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:support_legacy_test_gpu_amd_any        FAILED in 3 out of 3 in 17.5s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 17.5s, min = 17.4s, avg = 17.4s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_legacy_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_legacy_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_legacy_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/backends/gpu/codegen/triton:support_test                           FAILED in 3 out of 3 in 18.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 18.8s, min = 7.7s, avg = 12.2s, dev = 4.7s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_test/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_test/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/backends/gpu/codegen/triton/support_test/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/service/gpu/tests:gpu_kernel_tiling_test_gpu_amd_any               FAILED in 3 out of 3 in 13.1s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 13.1s, min = 12.8s, avg = 12.9s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_kernel_tiling_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_kernel_tiling_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_kernel_tiling_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/service/gpu/tests:gpu_triton_custom_call_test_gpu_amd_any          FAILED in 3 out of 3 in 7.7s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 7.7s, min = 7.7s, avg = 7.7s, dev = 0.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_triton_custom_call_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_triton_custom_call_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/tests/gpu_triton_custom_call_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/service/gpu/transforms:dot_dimension_sorter_test_gpu_amd_any       FAILED in 3 out of 3 in 10.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 10.8s, min = 10.7s, avg = 10.7s, dev = 0.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/transforms/dot_dimension_sorter_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/transforms/dot_dimension_sorter_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/service/gpu/transforms/dot_dimension_sorter_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/stream_executor/rocm:rocm_stream_test_gpu_amd_any                  FAILED in 3 out of 3 in 7.0s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 7.0s, min = 6.9s, avg = 7.0s, dev = 0.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/stream_executor/rocm/rocm_stream_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/stream_executor/rocm/rocm_stream_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/stream_executor/rocm/rocm_stream_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:all_reduce_test_gpu_amd_any                                  FAILED in 3 out of 3 in 9.7s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 9.7s, min = 9.6s, avg = 9.6s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/all_reduce_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/all_reduce_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/all_reduce_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:broadcast_test_gpu_amd_any                                   FAILED in 3 out of 3 in 11.9s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 11.9s, min = 9.6s, avg = 11.1s, dev = 1.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/broadcast_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/broadcast_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/broadcast_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:conv_depthwise_backprop_filter_test_gpu_amd_any              FAILED in 3 out of 3 in 92.1s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 92.1s, min = 22.6s, avg = 46.9s, dev = 31.9s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/conv_depthwise_backprop_filter_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/conv_depthwise_backprop_filter_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/conv_depthwise_backprop_filter_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:copy_test_gpu_amd_any                                        FAILED in 3 out of 3 in 11.8s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 11.8s, min = 9.7s, avg = 10.4s, dev = 1.0s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/copy_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/copy_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/copy_test_gpu_amd_any/test_attempts/attempt_2.log
[2025-01-16T02:54:23.280Z] //xla/tests:gather_operation_test_gpu_amd_any                            FAILED in 3 out of 3 in 9.9s
[2025-01-16T02:54:23.280Z]   Stats over 3 runs: max = 9.9s, min = 9.7s, avg = 9.8s, dev = 0.1s
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/gather_operation_test_gpu_amd_any/test.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/gather_operation_test_gpu_amd_any/test_attempts/attempt_1.log
[2025-01-16T02:54:23.280Z]   /root/.cache/bazel/_bazel_root/217377b0e928b171b843eb11ea7bc36e/execroot/xla/bazel-out/k8-opt/testlogs/xla/tests/gather_operation_test_gpu_amd_any/test_attempts/attempt_2.log

Vs failing xla tests in this PR:

[2025-01-16T10:52:59.222Z] @local_xla//xla/service/gpu/fusions/triton:triton_fusion_emitter_int4_device_test_gpu_amd_any FAILED in 139.6s
[2025-01-16T10:52:59.222Z]   /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/external/local_xla/xla/service/gpu/fusions/triton/triton_fusion_emitter_int4_device_test_gpu_amd_any/test.log
[2025-01-16T10:52:59.222Z] @local_xla//xla/stream_executor/gpu:gpu_test_kernels_fatbin_test_gpu_amd_any FAILED in 8.0s
[2025-01-16T10:52:59.222Z]   /root/.cache/bazel/_bazel_root/fbac33eb30dbfb6b11b15a7ff5ac830d/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/external/local_xla/xla/stream_executor/gpu/gpu_test_kernels_fatbin_test_gpu_amd_any/test.log
[2025-01-16T10:52:59.222Z] 

@alekstheod
Copy link
Author

We decided to skip the int4 triton tests and investigate the issue separate from the weekly sync.

@alekstheod alekstheod force-pushed the develop-upstream-sync-20250113 branch from d123032 to a5407d3 Compare January 16, 2025 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.