-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations #139277
Comments
Cc @rust-lang/opsem All of these are about subnormals, right? If so, would be good to reframe the issue (in particular the title) so that we avoid making this yet another float semantics kitchen sink issue. My first inclination is to say that code which wants "faster" float ops that are e.g. permitted to flush subnormals should use separate operations / types / flags to opt-in to such non-standard behavior. We'll have to emit different LLVM IR for this anyway, using strict FP intrinsics or something like that.
Oh great, they made all C and Rust code that uses floats UB on their platform then (at least when it is built by LLVM). |
FWIW, the only reference to this I could find is https://developer.apple.com/documentation/xcode/writing-armv6-code-for-ios, and Rust doesn't support ARMv6 (and Apple hasn't supported that since iOS 4.2.1 as I understand it) (and even then, you might be able to manually enable it, so even if we were to extend support for that platform, we'd "just" have to do this in a dylib entry point or something). |
For Metal, Arm, and real-time, I think so. If Rust intends to support Vulkan SPIR-V as a compilation target (which I believe it does), then the Vulkan specification applies, which provides weaker guarantees: infinities, the sign of zero, and NaNs may not be preserved, and infinities and NaNs may become undefined values. There is a standardized method to request stronger guarantees, but it can only be used if the implementation supports them, and supporting them is optional. |
Looks like D3D12 does not support infinities and NaNs, so layered implementations of Vulkan on top of it inherit this limitation even when the underlying hardware does not have it. |
GPU targets causing everyone a headache, as usual. ;) But those have tons of other problems as well, my understanding is not even pointers work properly there. So in terms of categorizing I would say that is a GPU target issue, not a float semantics issue. I doubt they will ever properly implement Rust semantics so we need some general system for crates to opt-in to support such cursed targets.
|
An entirely new set of parallel methods on floats, |
Floating point math is about the most basic thing a GPU can do, so if it is The fraction of computing power on a client system that is not in one of those “cursed targets” (multiple address spaces, weird floating point, etc) is well under 50% (probably more like 20% or less) and dropping fairly quickly. Accelerators are where most of the new compute is nowadays. |
Flush-to-zero at least has a simple definition: whenever a subnormal value would produced, non-deterministically produce either that value or zero instead. But if we want to say that GPU floating point is not But even if the hardware does something "reasonable," the LLVM semantics for floating point are that they operate in the default IEEE-754 environment, and if the hardware doesn't implement that, it is unsound and can result in arbitrary undefined behavior.
3.15. FP Fast Math Mode shows that the various operators specify what fast-math contractions are allowed (i.e. NotNaN, NotInf, NSZ, AllowRecip), and this explicitly notes that this enables "fast math operations which are otherwise unsafe." The FunctionFloatControlINTEL capability (SPV_INTEL_float_controls2 extension) also provides the ability to control 3.37. FP Denorm Mode and 3.38. FP Operation Mode. If the SPIR-V backend isn't specifying |
I’m pretty sure that the result is merely unspecified. Otherwise it would be a security hole for WebGL and WebGPU. That said, this issue is not just about GPUs. There are also environments where one must turn off support for denormalized numbers in order to provide the needed real-time guarnatees or simply to run at all on the target hardware. Having to not use Rust in these environments would be less than great. I changed the title to just be about subnormals. GPUs can use a separate issue. |
There are two parts to this that I can see:
|
There are existing I believe the long term plan is to handle denormals as part of the rework of constrained FP using operand bundles, see https://discourse.llvm.org/t/rfc-change-of-strict-fp-operation-representation-in-ir/85021. |
Thanks for the response @RalfJung! I’m sorry that I didn’t bring this up before the RFC was accepted. I’m not sure if making this target-specific is a good idea. There could be situations where Rust code is loaded as a plugin by code that might have changed subnormal handling. Also, loading a library compiled with PipeWire will only flush subnormals if the user explicitly requested it (https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/2160), but the lead developer is considering changing the default for the realtime data thread, as flushing subnormals really is the fastest way to implement certain DSP algorithms. I think an exception for subnormals (where flushing might happen before or after rounding or not at all) is the safest option. There are just too many cases where subnormal numbers can’t be relied on, and I’d rather safe Rust code not crash because someone accidentially left FTZ or DAZ enabled. |
So these attributes are attached to individual float operations, or what is their scope?
There are situations where it is legal (but ofc unsafe) to mix code built for different targets, e.g. hardfloat and softfloat code if the ABI boundary does not involve any float types. So this is not a blocker for making this a target property. We could also make use of "target modifiers" here (rust-lang/rfcs#3716).
That's plain UB, similar to other ways to modify the FP status register. I don't think we should try to support this.
We have some code relying on proper subnormals for correctness, and other code wanting to rely on subnormal flushing. If we say subnormal flushing can occur any time on any target, we're just shifting around who's unhappy -- we're not fundamentally solving anything. So that does not seem like a good solution. |
These are function attributes. So if you add See https://llvm.org/docs/LangRef.html#denormal-fp-math for the documentation. |
As to why I'm saying the setup is iffy, I think the existing attribute works somewhat adequately for things like GPU targets, but it's not a great fit for things like NEON intrinsics (and I don't believe it's used there), because it is not fine-grained enough to say that e.g. scalar FP math is IEEE, but vector math (or even specific vector operations) use flushing. Doing this fine-grained at the operation level would allow modelling NEON properly. |
I’d prefer to avoid a solution that requires littering making audio processing code with |
I think allowing nondeterministic flush-to-zero on targets where subnormals aren't preserved should probably be fine, and avoids the need for a soft float ABI. Similarly, libcore for such targets would be compiled this way as well, so wouldn't need any shims. (In a weird way, you could look at default ftz as a similar deficiency to the x87 behavior for floating point returns.) If we make it so temporarily enabling ftz is only So DSP implementations wouldn't need to be littered with And then there's always the (very bad, not ever endorsed) idea of just ignoring the UB. It's well established that the worst case behavior of ftz with optimizations assuming not is merely just arbitrarily wrong outputs when subnormals are involved in the computation with the current compiler implementations in the absence of very esoteric user code. I will not ever recommend it… but it wouldn't be any worse than the usual status quo with using C/C++. |
Would it make sense to unconditionally set |
The issue generally isn't with known denormal values. It's about values that may or may not be denormal in a context. For example, Setting This is assuming everything works as I have understood the reference document, which isn't a guarantee. |
I agree those are desirable outcomes. However, we also can't penalize code that wants proper subnormal arithmetic on targets that have it -- that must continue to work and receive the full suite of optimizations. So How sound is it to mix code with and without |
What about building the standard library with that flag? Does the standard library include any code that would be penalized by it significantly, or even at all? |
Nondeterministic behavior might be okay in at least some applications. |
That seems very hard to say, so I'd be uncomfortable making this a stable guarantee. But it'd be for t-libs-api to decide. |
Given the audio situation I think it would be better to allow compile-time constant-evaluation of subnormals, but without the requirement that it match the runtime behavior. Is there any non-contrieved situation where this would cause a performance penalty? |
You mean constant folding? "compile-time evaluation" sounds like CTFE but I don't see how that would be relevant here. If we want to allow const-folding of subnormals we have to specify the semantics as non-deterministically doing subnormal flushing or not. I don't know if/how LLVM can represent that.
I'm the wrong person to answer that question. I know how to make a compiler correct, not how to make it generate fast code. ;) It could cause correctness issues if code relying on no subnormal flushing calls standard library methods that then do subnormal flushing. So we probably couldn't use just plain non-determinism, we'd have to spell out conditions under which subnormal flushing is guaranteed not to occur. |
Yeah, just declaring subnormals a non-deterministic free-for-all in the language sucks for code that does want them to work properly. One ugly solution would be to lift the control register some ISAs have for this to the AM level (as thread-local state). This allows accounting for code that needs a specific mode as well as code that is fine with whatever the current mode is. Changing the mode could be fallible on some platforms where e.g. proper subnormal support would require switching from hard float to soft float. But that AM state opens up the same box of pandora for an optimizing compiler as any other deviation from “default fpenv everywhere, changing it is UB” does. Even ignoring the impact on constant folding etc., you have to start treating all floating point operations as depending on this global state, rather than being pure operations that can be scheduled freely. At least they wouldn’t have side effects in this case (in contrast to non-default exception handling), but LLVM is still poorly prepared for a language where all floating point math works that way. I still have some hope that Rust will eventually be able to support non-default fpenvs in some way. It’ll require tricky language design decisions, but the blocker of good LLVM support will be resolved eventually, and at least the rounding mode portion is well motivated. Perhaps subnormals can piggy-back off that when it does happen. The challenges at the language level are as similar as those at the LLVM level. |
If the AM state only switches between "guaranteed subnormal preservation" and "non-deterministically either preserve subnormals or flush them", then we can still always const-fold with subnormal preservation. So the only optimizations this affects are the ones that truly need an operation to be deterministic, e.g. scalar evolution. That could still be prohibitive though... |
That is an interesting idea but yeah I suspect it doesn’t change the calculus because you’d still have to avoid moving “guaranteed subnormals” operations into code regions where the other mode is enabled. That’s probably the biggest social and engineering challenge: migrating the IR and all code touching the IR away from “pure op that can be freely moved around subject only to SSA form’s defs-dominate-uses rule” and towards something like LLVM’s constrained intrinsics (or operand bundles on regular intrinsics) that can express such dependencies at all. |
On some Arm platforms, the FPU is not strictly conformant to IEEE-754. If the FPU is put in strict standards complance mode, some operations become traps to the OS. The OS must emulate the operation. I believe that all operations involving subnormals (before or after rounding) fall into this case. On x86-64, denormals trigger a (very slow) microcode assist on most cores. On at least Metal, the hardware may flush subnormals to zero at its discretion, and I believe GPUs generally allow this.
In these cases, it is impossible to support strict IEEE-754 behavior if one has real-time requirements, or (in the Arm case) if the OS does not include the needed support code. The real-time case is not just theoretical: when doing audio DSP, subnormals correspond to sounds that are so quiet that they can very much safely be flushed to zero, as they are below the threshold of hearing. Violating hard real-time guarantees, however, is extremely noticeable, so (unless I am very mistaken) audio DSP code generally sets the flush-to-zero and denormals-are-zero bits in the FPU. Requiring all audio DSP code to be written in assembler is silly, and nobody is actually going to do that.
I believe that at least some versions of iOS enable flush-to-zero by default, so any Rust library with a C API must expect to be called in this environment (it’s part of the platform ABI). It’s worth noting that not having FTZ and DAZ set can be a security vulnerability (denial of service) in code that operates on untrusted input, as it can make processing far, far more expensive than it would be otherwise.
The text was updated successfully, but these errors were encountered: