-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pclmulqdq intrinsics don't inline well across target_feature changes anymore #139029
Comments
Thank you for the report! It would be useful to bisect the regression to the specific PR using cargo-bisect-rustc to make it easier to figure out what happened and ping the relevant people. |
@rustbot label: -E-needs-bisection +S-has-bisection +A-target-feature +T-compiler +I-slow +O-x86_64 |
i also arrived at that change with (notes on how i got there)
ends up with...
|
I suspect that this is because of https://github.com/rust-lang/rust/pull/135408/files#diff-f5ccef4931d8d453a6d2cadcd1fda96f59b96dfdc73fd0543db257cb4a362020R851, but I'm not entirely sure why. @RalfJung do you have an idea here? |
That PR changed the ABI so SSE vectors are passed by-val rather than by-ptr. I would have expected that to make optimizations easier, not harder. But sadly I know very little about the later parts of the backend, so I am clueless here. Cc @nikic |
wild speculation: we were relying on some opt via mem2reg |
i'd noticed the
pclmulqdq
intrinsics incrc32fast
were notable in aperf report
of a benchmark last night. somewhat shockingly, there were functions whose body waspclmulqdq xmm0, xmm1, 17; ret
andpclmulqdq xmm0, xmm1, 0; ret
, complete with constraining callers' choice of xmm registers! after a bit of digging it seems to be a regression in nightly.the specific regression i'd started at can be reproduced with
cargo bench
in https://github.com/srijs/rust-crc32fast .cargo +1.85.1 bench
produceswhereas
cargo +nightly bench
producesafter looking at perf a bit i believe this is representative: https://rust.godbolt.org/z/8dxcE4vo1 . i'm including everything there in this issue as well.
Code
I tried this code:
I expected to see this happen (with
-C opt-level=3
):Instead, this happened (also
-C opt-level=3
):Version it worked on
1.85.1, 1.31.0, and a half dozen in between.
additionally, beta (rust version 1.86.0-beta.7 (7824ede 2025-03-22) seems good.
nightly with
-C opt-level=3 -C target-feature=+pclmul
still does great.Version with regression
in the above godbolt link, i see
--version
in therustc nightly
tab providesrustc 1.87.0-nightly (a2e63569f 2025-03-26)
. this is consistent with how i first saw this locally:rustc +nightly --version --verbose
:Related improvement along the way
adding the same
target_feature
block on the inner function sees nightly produce somewhat better-than-baseline code: https://rust.godbolt.org/z/sGrYedeaPwith
rustc +nightly -C opt-level 3
yields:whereas before the codegen was identical regardless of the
target_feature
attribute on the inner function. so at least in some cases there is a modest improvement?@rustbot modify labels: +regression-from-stable-to-nightly -regression-untriaged
The text was updated successfully, but these errors were encountered: