-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueUseState calculation too coarse for x64's sinkable_load? #10010
Comments
So the issue here is that Imagine the following (very smart, but possible) instruction selection rules:
This particular example is a bit far-fetched, but it's important that we don't encode even more subtle knowledge about the possible instruction selection combinations into the core algorithms; they have to assume that rule-matching can go arbitrarily deep. We cannot allow the load to sink twice to two different locations, so the multiplicity analysis (correctly) propagates this multiple-use status backward to the load and prevents it from sinking.
Believe it or not this was the simplest abstraction I could come up with at the time (I know, I'm not totally happy either...) It certainly errs on the side of safety but it did prevent real bugs we had / kept creating at the time. I know this last came up here and the explanation I gave there is as from-first-principles as I could make it; suggestions welcome for ways to explain it better and/or better algorithms that preserve all the safety properties we want! |
I'll note as well that we could relax multiplicity if we "cut" it at certain points by pinning values into registers -- e.g., constrain that |
I was profiling some SIMD-using code today and noticed that the WebAssembly simd instruction
v128.load32_splat
wasn't generating the x64 instruction I thought it was supposed to do. This is what I'd expect but this is what I got:Notably at offset
e
the it's not usingvbroadcastss
, but instead a combo ofmovl
,vmovd
, andvpbroadcastd
. I don't have a great way of measuring the relative performance of one vs the other as my hope was to convince Cranelift to generate one or the other to measure.The reason that this lowering rule isn't firing is due to:
(or at least I think this is why
sinkable_load
isn't firing)I know I've run up against
compute_use_states
in the past and it's subtle enough that I can't ever seem to keep it in my head. I can't seem to wrap my head around this time either...The text was updated successfully, but these errors were encountered: