-
Notifications
You must be signed in to change notification settings - Fork 99
Develop upstream sync 20250113 #2802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop upstream sync 20250113 #2802
Conversation
Imported from GitHub PR openxla/xla#19649 This change has as a goal to introduce an external dependency to the rocm library and tools. Building xla with the hermetic rocm is done by using these env variables: --repo_env=OS=ubuntu_20.04 --repo_env=ROCM_VERSION=6.2.0 To use only hermetic libs define this flag: --@local_config_rocm//rocm:use_rocm_hermetic_rpath=True This flag will make rpaths and configs to look inside the sandbox If flag is not set then default installation paths are used e.g /opt/rocm One has to provie OS version and ROCm version to initialize a proper rocm repository. If these flags are not set then default ROCm installation will be used to build XLA. depends-on: openxla/xla#19691 Copybara import of the project: -- cf744eca78f697144e122c6a9d1aa8fc52722b20 by Alexandros Theodoridis <[email protected]>: Implement hermetic rocm dependency -- 4f4ad859ec3143fdb04f7792541c61b98c708397 by Alexandros Theodoridis <[email protected]>: Add missing dependency -- 8e164f765b45b5e5d118b02695fd6d6e2b0b232d by Alexandros Theodoridis <[email protected]>: Add missing dependency and remove so files from data -- 35538f4922b5b28b9debd0ce17bb15b83b5921fc by Alexandros Theodoridis <[email protected]>: Rename setting to use_rocm_hermetic_rpath -- 58d140220e9e58572c9a7ae3de2ec1ea189566d3 by Alexandros Theodoridis <[email protected]>: Fix build for cuda and cpu Merging this change closes tensorflow#19649 PiperOrigin-RevId: 713248195
PiperOrigin-RevId: 713248622
`std::mismatch` should be called with an end iterator as the second argument if there is no guarantee on element count in the second range. PiperOrigin-RevId: 713264159
Move comparison of executable != nullptr _before_ calling std::move(executable). This is really only used for logging, but definitely adds confusion to the logs when it's always 0 :). PiperOrigin-RevId: 713272260
…andle and SupportsSendRecvCallbacks PiperOrigin-RevId: 713276521
…Usage It was always set to false by the callers. PiperOrigin-RevId: 713277020
….cseConstants=false` to avoid constant folding and CSE which is expensive. PiperOrigin-RevId: 713277781
…lso be used in vectorizing AtomicRMW in follow-up changes. PiperOrigin-RevId: 713281944
PiperOrigin-RevId: 713282226
The algorithm was checking whether to write to the output or not by comparing the current slice index with the number of indices per warp. It works only when we have perfectly tiled indices, e.g. 50 indices per warp with a total of 2000 indices. As soon as we have 2001 indices, the last warp processes 1 update slice, but never writes it down. Also simplified the logic for the update loop that accumulates elements in registers. Instead of having scf.if inside of xla.loop, now we have two different xla.loops in different cases of scf.if, that either overwrite the accumulator or combine it with the new data. PiperOrigin-RevId: 713296321
… when adding vectorization for AtomicRMW which will only be available for Hopper. PiperOrigin-RevId: 713297711
…tives APIs to acquire communicator in CollectiveThunk Implement Cliques support for XLA:CPU collectives for consistency with XLA:GPU. Further unification will be in followup CLs. PiperOrigin-RevId: 713305764
In preparation for larger changes, this entry point is being disabled here for now. PiperOrigin-RevId: 713316210
PiperOrigin-RevId: 713318085
Changed to use signature_key for the Run() method for input / output maps since it aligns with other parameters. PiperOrigin-RevId: 713323123
…ing path. PiperOrigin-RevId: 713323821
PiperOrigin-RevId: 713330122
…when one CollectivePermute (cp) depends on the other. When we insert control dependency from send-start of one cp to recv-start of another, we need to make sure that the cps are in post order. PiperOrigin-RevId: 713336414
PiperOrigin-RevId: 713346163
…sizes. PiperOrigin-RevId: 713353992
PiperOrigin-RevId: 7133574
…ll-to-all operation. The following example shows the detailed method. ``` base_shape: (32,32,32,32) mesh: a=2, b=4 old sharding: P('a', 'b', None, None), local shape (16,8,32,32) new sharding: P(None, None, 'a', 'b'), local shape (32,32,16,8) // Step 1. Merge sharding axes to a single dimension reshape (16,8,32,32) -> (16,8,2,16,4,8) transpose (16,8,2,16,4,8) -> (2,4,16,8,16,8) with permutation (2,4,0,1,3,5) reshape (2,4,16,8,16,8) -> (8,16,8,16,8) // Step 2. Apply the all-to-all all-to-all on (8,16,8,16,8) with split_dimension = 0 // Step 3. Split sharding axes to multiple dimensions reshape (8,16,8,16,8) -> (2,4,16,8,16,8) transpose (2,4,16,8,16,8) -> (2,16,4,8,16,8) with permutation (0,2,1,3,4,5) reshape (2,16,4,8,16,8) -> (32,32,16,8) ``` PiperOrigin-RevId: 713362037
PiperOrigin-RevId: 713372912
PiperOrigin-RevId: 713374730
…parameters. PiperOrigin-RevId: 713389880
PiperOrigin-RevId: 713394310
PiperOrigin-RevId: 713395731
This CL takes care of 1. Migrating the targets ``` tensorflow/compiler/xla:test tensorflow/compiler/xla:test_helpers tensorflow/compiler/xla/service:pattern_matcher_gmock ``` to tensorflow/compiler/xla/hlo/testlib 2. Setting up build aliases in xla or xla/service/ ensuring external dependencies are still satisfied. Phase II will take care of migration of external projects dependencies PiperOrigin-RevId: 713400473
This change also drops the relevant C++ plumbing. PiperOrigin-RevId: 713412874
…t a workaround. PiperOrigin-RevId: 713417811
afe115b
to
0e026ce
Compare
0e026ce
to
e1d9704
Compare
Nightly build failing tests list for openxla:
Vs failing xla tests in this PR:
|
We decided to skip the int4 triton tests and investigate the issue separate from the weekly sync. |
d123032
to
a5407d3
Compare
tensorflow/core/kernels/BUILD
Outdated
@@ -3924,6 +3924,7 @@ tf_cuda_cc_test( | |||
srcs = ["matmul_op_test.cc"], | |||
tags = [ | |||
"no_aarch64", # b/282068262 | |||
"cuda-only", # weekly sync 20250113 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I ask why we skip this matmul_op_test
? is it subtest failed or do we know the root cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails with this error:
2025-01-17 11:20:21.004074: W tensorflow/core/framework/op_kernel.cc:1857] OP_REQUIRES failed at matmul_op_fused.cc:582 : INTERNAL: Unsupported fusion for BlasLt Matmul
2025-01-17 11:20:21.004123: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: Unsupported fusion for BlasLt Matmul
[[{{node fused_matmul}}]]
2025-01-17 11:20:21.004146: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: Unsupported fusion for BlasLt Matmul
[[{{node fused_matmul}}]]
[[fused_matmul/_1]]
2025-01-17 11:20:21.004166: I tensorflow/core/framework/local_rendezvous.cc:426] Local rendezvous recv item cancelled. Key hash: 6312669201998896265
tensorflow/core/kernels/matmul_op_test.cc:223: Failure
Value of: (last_status)
Expected: is OK
Actual: INTERNAL: 2 root error(s) found.
(0) INTERNAL: Unsupported fusion for BlasLt Matmul
[[{{node fused_matmul}}]]
[[fused_matmul/_1]]
(1) INTERNAL: Unsupported fusion for BlasLt Matmul
[[{{node fused_matmul}}]]
0 successful operations.
0 derived errors ignored. (of type absl::lts_20230802::Status)
tensorflow/core/kernels/matmul_op_test.cc:254: Failure
Expected equality of these values:
matmul.dtype()
Which is: 19
fused_matmul.dtype()
Which is: 1
I created an issue to enable it back.
https://github.com/ROCm/frameworks-internal/issues/10718
I couldn't find the root cause by looking into the code directly. I guess we have to analyze it separately.
There are several tests failing with the same error. They are not failing on mi100, but do on mi210 and later.
Full log is here. I will try to disable the individual tests.
test.log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something in common in failing tests. All of them are those with a suffix WithActiovation and all are for type Eigen::half:
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul256x128x64WithActivation, where TypeParam = Eigen::half
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul1x256x256WithActivation, where TypeParam = Eigen::half
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul256x256x1WithActivation, where TypeParam = Eigen::half
[ FAILED ] Test/FusedMatMulWithBiasOpTest/1.MatMul1x256x1WithActivation, where TypeParam = Eigen::half
I will try to disable only those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
!gen-cache |
The disk cache generation for the cpu-pycpp tests status: successfully finished |
!gen-cache |
The disk cache generation for the cpu-pycpp tests status: successfully finished The disk cache generation for the XLA tests status: scheduled |
!gen-cache |
The disk cache generation for the cpu-pycpp tests status: successfully finished The disk cache generation for the XLA tests status: successfully finished |
Weekly sync 13/01/2025