Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations #139277

Open
DemiMarie opened this issue Apr 2, 2025 · 28 comments
Labels
A-floating-point Area: Floating point numbers and arithmetic needs-triage This issue may need triage. Remove it if it has been sufficiently triaged.

Comments

@DemiMarie
Copy link
Contributor

On some Arm platforms, the FPU is not strictly conformant to IEEE-754. If the FPU is put in strict standards complance mode, some operations become traps to the OS. The OS must emulate the operation. I believe that all operations involving subnormals (before or after rounding) fall into this case. On x86-64, denormals trigger a (very slow) microcode assist on most cores. On at least Metal, the hardware may flush subnormals to zero at its discretion, and I believe GPUs generally allow this.

In these cases, it is impossible to support strict IEEE-754 behavior if one has real-time requirements, or (in the Arm case) if the OS does not include the needed support code. The real-time case is not just theoretical: when doing audio DSP, subnormals correspond to sounds that are so quiet that they can very much safely be flushed to zero, as they are below the threshold of hearing. Violating hard real-time guarantees, however, is extremely noticeable, so (unless I am very mistaken) audio DSP code generally sets the flush-to-zero and denormals-are-zero bits in the FPU. Requiring all audio DSP code to be written in assembler is silly, and nobody is actually going to do that.

I believe that at least some versions of iOS enable flush-to-zero by default, so any Rust library with a C API must expect to be called in this environment (it’s part of the platform ABI). It’s worth noting that not having FTZ and DAZ set can be a security vulnerability (denial of service) in code that operates on untrusted input, as it can make processing far, far more expensive than it would be otherwise.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 2, 2025
@lolbinarycat lolbinarycat added the A-floating-point Area: Floating point numbers and arithmetic label Apr 2, 2025
@RalfJung
Copy link
Member

RalfJung commented Apr 3, 2025

Cc @rust-lang/opsem

All of these are about subnormals, right? If so, would be good to reframe the issue (in particular the title) so that we avoid making this yet another float semantics kitchen sink issue.

My first inclination is to say that code which wants "faster" float ops that are e.g. permitted to flush subnormals should use separate operations / types / flags to opt-in to such non-standard behavior. We'll have to emit different LLVM IR for this anyway, using strict FP intrinsics or something like that.

I believe that at least some versions of iOS enable flush-to-zero by default, so any Rust library with a C API must expect to be called in this environment

Oh great, they made all C and Rust code that uses floats UB on their platform then (at least when it is built by LLVM).

@madsmtm
Copy link
Contributor

madsmtm commented Apr 3, 2025

I believe that at least some versions of iOS enable flush-to-zero by default, so any Rust library with a C API must expect to be called in this environment

Oh great, they made all C and Rust code that uses floats UB on their platform then (at least when it is built by LLVM).

FWIW, the only reference to this I could find is https://developer.apple.com/documentation/xcode/writing-armv6-code-for-ios, and Rust doesn't support ARMv6 (and Apple hasn't supported that since iOS 4.2.1 as I understand it) (and even then, you might be able to manually enable it, so even if we were to extend support for that platform, we'd "just" have to do this in a dylib entry point or something).

@DemiMarie DemiMarie changed the title Some platforms cannot provide strict IEEE-754 conformance due to real-time guarantees and/or hardware limitations Some platforms cannot provide strict IEEE-754 conformant subnormls due to real-time guarantees and/or hardware limitations Apr 3, 2025
@DemiMarie DemiMarie changed the title Some platforms cannot provide strict IEEE-754 conformant subnormls due to real-time guarantees and/or hardware limitations Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations Apr 3, 2025
@DemiMarie
Copy link
Contributor Author

All of these are about subnormals, right? If so, would be good to reframe the issue (in particular the title) so that we avoid making this yet another float semantics kitchen sink issue.

For Metal, Arm, and real-time, I think so. If Rust intends to support Vulkan SPIR-V as a compilation target (which I believe it does), then the Vulkan specification applies, which provides weaker guarantees: infinities, the sign of zero, and NaNs may not be preserved, and infinities and NaNs may become undefined values. There is a standardized method to request stronger guarantees, but it can only be used if the implementation supports them, and supporting them is optional.

@DemiMarie DemiMarie changed the title Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations Some platforms cannot provide strict IEEE-754 conformant subnormals, infinities, and/or NaNs due to real-time guarantees and/or hardware limitations Apr 3, 2025
@DemiMarie
Copy link
Contributor Author

Looks like D3D12 does not support infinities and NaNs, so layered implementations of Vulkan on top of it inherit this limitation even when the underlying hardware does not have it.

@RalfJung
Copy link
Member

RalfJung commented Apr 4, 2025 via email

@bstrie
Copy link
Contributor

bstrie commented Apr 4, 2025

so we need some general system for crates to opt-in to support such cursed targets

An entirely new set of parallel methods on floats, unsafe { foo.add_cursed(bar) }?

@DemiMarie
Copy link
Contributor Author

Floating point math is about the most basic thing a GPU can do, so if it is unsafe then so is every GPU kernel. I think it would be better to have a crate-level attribute saying “I’m fine with any of the semantics allowed by SPIR-V,” rather than having to use clumsy operations.

The fraction of computing power on a client system that is not in one of those “cursed targets” (multiple address spaces, weird floating point, etc) is well under 50% (probably more like 20% or less) and dropping fairly quickly. Accelerators are where most of the new compute is nowadays.

@CAD97
Copy link
Contributor

CAD97 commented Apr 5, 2025

Flush-to-zero at least has a simple definition: whenever a subnormal value would produced, non-deterministically produce either that value or zero instead. But if we want to say that GPU floating point is not unsafe, we need to be precise as to whether producing an infinity/NaN is undefined (UB, nasal demons, the entire program has no meaning) or merely unspecified (the operation that would produce such a value produces an arbitrary meaningless value instead).

But even if the hardware does something "reasonable," the LLVM semantics for floating point are that they operate in the default IEEE-754 environment, and if the hardware doesn't implement that, it is unsound and can result in arbitrary undefined behavior.

then the Vulkan specification applies, which provides weaker guarantees: infinities, the sign of zero, and NaNs may not be preserved, and infinities and NaNs may become undefined values

3.15. FP Fast Math Mode shows that the various operators specify what fast-math contractions are allowed (i.e. NotNaN, NotInf, NSZ, AllowRecip), and this explicitly notes that this enables "fast math operations which are otherwise unsafe." The FunctionFloatControlINTEL capability (SPV_INTEL_float_controls2 extension) also provides the ability to control 3.37. FP Denorm Mode and 3.38. FP Operation Mode.

If the SPIR-V backend isn't specifying -spirv-ext=+SPV_INTEL_float_controls2 to LLVM by default, it probably should be. But other divergences mean that GPU Rust is going to be a nonstandard dialect, because there are other things (like dynamic indirection) that are normal on the CPU can't be made to work on the GPU without prohibitive compromises.

@DemiMarie DemiMarie changed the title Some platforms cannot provide strict IEEE-754 conformant subnormals, infinities, and/or NaNs due to real-time guarantees and/or hardware limitations Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations Apr 5, 2025
@DemiMarie
Copy link
Contributor Author

I’m pretty sure that the result is merely unspecified. Otherwise it would be a security hole for WebGL and WebGPU.

That said, this issue is not just about GPUs. There are also environments where one must turn off support for denormalized numbers in order to provide the needed real-time guarnatees or simply to run at all on the target hardware. Having to not use Rust in these environments would be less than great.

I changed the title to just be about subnormals. GPUs can use a separate issue.

@RalfJung
Copy link
Member

RalfJung commented Apr 7, 2025

There are two parts to this that I can see:

  • Figuring out how to soundly represent such programs in LLVM IR. Using regular float arithmetic operations is not correct with current versions of LLVM as LLVM will assume those have per-spec subnormal behavior. This may require significant work on the LLVM side. @nikic do you know whether there are any concrete plans here?
  • Figuring out how to work with language dialects on the Rust side.
    • Do we walk back a bit on the "strict IEEE semantics" part of RFC 3514, and add an exception for subnormals (on top of the existing exception for NaNs)? At the time the RFC was written, the only target we were aware of where subnormals are broken are 32-bit ARM NEON instructions where the answer then is "just don't use those for scalar operations"; I don't think the concerns you mention ever came up during the discussion.

      This is technically a breaking change, though unlikely to trip anyone. Rust still can't actually support non-standard subnormals until LLVM does but at least Miri could randomly flush subnormals and we could tell unsafe code not to rely on this.

    • Or do we make this more target-specific? We already have endianess as a target parameter changing AM behavior; this could be similar in flavor.

@nikic
Copy link
Contributor

nikic commented Apr 7, 2025

Figuring out how to soundly represent such programs in LLVM IR. Using regular float arithmetic operations is not correct with current versions of LLVM as LLVM will assume those have per-spec subnormal behavior. This may require significant work on the LLVM side. @nikic do you know whether there are any concrete plans here?

There are existing "denormal-fp-math" and "denormal-fp-math-f32" attributes that control denormal assumptions for FP math. The whole setup is somewhat iffy though.

I believe the long term plan is to handle denormals as part of the rework of constrained FP using operand bundles, see https://discourse.llvm.org/t/rfc-change-of-strict-fp-operation-representation-in-ir/85021.

@DemiMarie
Copy link
Contributor Author

DemiMarie commented Apr 7, 2025

Thanks for the response @RalfJung! I’m sorry that I didn’t bring this up before the RFC was accepted.

I’m not sure if making this target-specific is a good idea. There could be situations where Rust code is loaded as a plugin by code that might have changed subnormal handling. Also, loading a library compiled with -ffast-math or -Ofast by GCC12 and older can change the behavior of subnormals for other libraries: https://moyix.blogspot.com/2022/09/someones-been-messing-with-my-subnormals.html. Flushing subnormals is also a fairly common recommendation for fixing performance problems.

PipeWire will only flush subnormals if the user explicitly requested it (https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/2160), but the lead developer is considering changing the default for the realtime data thread, as flushing subnormals really is the fastest way to implement certain DSP algorithms.

I think an exception for subnormals (where flushing might happen before or after rounding or not at all) is the safest option. There are just too many cases where subnormal numbers can’t be relied on, and I’d rather safe Rust code not crash because someone accidentially left FTZ or DAZ enabled.

@RalfJung
Copy link
Member

RalfJung commented Apr 8, 2025

@nikic

There are existing "denormal-fp-math" and "denormal-fp-math-f32" attributes that control denormal assumptions for FP math. The whole setup is somewhat iffy though.

So these attributes are attached to individual float operations, or what is their scope?

@DemiMarie

There could be situations where Rust code is loaded as a plugin by code that might have changed subnormal handling

There are situations where it is legal (but ofc unsafe) to mix code built for different targets, e.g. hardfloat and softfloat code if the ABI boundary does not involve any float types. So this is not a blocker for making this a target property.

We could also make use of "target modifiers" here (rust-lang/rfcs#3716).

Also, loading a library compiled with -ffast-math or -Ofast by GCC12 and older can change the behavior of subnormals for other libraries

That's plain UB, similar to other ways to modify the FP status register. I don't think we should try to support this.

I think an exception for subnormals (where flushing might happen before or after rounding or not at all) is the safest option.

We have some code relying on proper subnormals for correctness, and other code wanting to rely on subnormal flushing. If we say subnormal flushing can occur any time on any target, we're just shifting around who's unhappy -- we're not fundamentally solving anything. So that does not seem like a good solution.

@nikic
Copy link
Contributor

nikic commented Apr 8, 2025

@nikic

There are existing "denormal-fp-math" and "denormal-fp-math-f32" attributes that control denormal assumptions for FP math. The whole setup is somewhat iffy though.

So these attributes are attached to individual float operations, or what is their scope?

These are function attributes. So if you add "denormal-fp-math"="dynamic,dynamic" to a function, LLVM assumes that both inputs and outputs of float operations inside the function may flush denormals, and e.g. will no longer constant fold float operations that take/produce denormals.

See https://llvm.org/docs/LangRef.html#denormal-fp-math for the documentation.

@nikic
Copy link
Contributor

nikic commented Apr 8, 2025

As to why I'm saying the setup is iffy, I think the existing attribute works somewhat adequately for things like GPU targets, but it's not a great fit for things like NEON intrinsics (and I don't believe it's used there), because it is not fine-grained enough to say that e.g. scalar FP math is IEEE, but vector math (or even specific vector operations) use flushing. Doing this fine-grained at the operation level would allow modelling NEON properly.

@DemiMarie
Copy link
Contributor Author

I’d prefer to avoid a solution that requires littering making audio processing code with unsafe or defaulting to a softfloat ABI on targets where subnormals don’t work. In addition, having to save and restore floating point state around every call to libcore would be very bad, especially because so much desugars to libcore function calls.

@CAD97
Copy link
Contributor

CAD97 commented Apr 9, 2025

I think allowing nondeterministic flush-to-zero on targets where subnormals aren't preserved should probably be fine, and avoids the need for a soft float ABI. Similarly, libcore for such targets would be compiled this way as well, so wouldn't need any shims. (In a weird way, you could look at default ftz as a similar deficiency to the x87 behavior for floating point returns.)

If we make it so temporarily enabling ftz is only unsafe and not instant UB somehow, then the restriction would be not calling any functionality which uses normal float arithmetic during that section, but the methods like e.g. ftz_add can be safe by again being non-deterministic. Any functions would need to be "robust" against the potential of calling any code that relies on the default fpenv, though, and panicking would be required to reset ftz to false if that's the default.

So DSP implementations wouldn't need to be littered with unsafe, they'd only need unsafe when registering the DSP. It's far from ideal but it would work.

And then there's always the (very bad, not ever endorsed) idea of just ignoring the UB. It's well established that the worst case behavior of ftz with optimizations assuming not is merely just arbitrarily wrong outputs when subnormals are involved in the computation with the current compiler implementations in the absence of very esoteric user code. I will not ever recommend it… but it wouldn't be any worse than the usual status quo with using C/C++.

@DemiMarie
Copy link
Contributor Author

Would it make sense to unconditionally set "denormal-fp-math"="dynamic,dynamic"? Does constant folding denormals help on non-esoteric user code? That would avoid the UB.

@CAD97
Copy link
Contributor

CAD97 commented Apr 9, 2025

The issue generally isn't with known denormal values. It's about values that may or may not be denormal in a context. For example, -0.0 + x can be folded to x under the default fpe, but with "denormal-fp-math"="dynamic,dynamic" this folding should not occur, as the dynamic ftz state will change the result of the addition.

Setting denormal-fp-math to preserve-sign or positive-zero instead seems to allow for either nondeterministic ftz or non-ftz behavior, but the ftz sign mode must match the processor state to avoid UB. preserve-sign would be correct behavior for only setting ftz, but ftz and nsz typically come together.

This is assuming everything works as I have understood the reference document, which isn't a guarantee.

@RalfJung
Copy link
Member

RalfJung commented Apr 9, 2025

I’d prefer to avoid a solution that requires littering making audio processing code with unsafe or defaulting to a softfloat ABI on targets where subnormals don’t work. In addition, having to save and restore floating point state around every call to libcore would be very bad, especially because so much desugars to libcore function calls.

I agree those are desirable outcomes. However, we also can't penalize code that wants proper subnormal arithmetic on targets that have it -- that must continue to work and receive the full suite of optimizations. So "denormal-fp-math"="dynamic,dynamic" on all code is not an option.

How sound is it to mix code with and without "denormal-fp-math"="dynamic,dynamic"? Hopefully, fully sound. So we could have a -C flag or a per-function or per-crate attribute that compiles to "denormal-fp-math"="dynamic,dynamic". Hopefully, we can get LLVM to agree that setting the ftz flag is fine as long as all code executed while the flag is set is inside functions compiled with "denormal-fp-math"="dynamic,dynamic". That said, since the standard library is not build with that flag, this plan relies on -Zbuild-std (or having a separate ftz-compatible target that we ship a std for).

@DemiMarie
Copy link
Contributor Author

What about building the standard library with that flag? Does the standard library include any code that would be penalized by it significantly, or even at all?

@DemiMarie
Copy link
Contributor Author

Setting denormal-fp-math to preserve-sign or positive-zero instead seems to allow for either nondeterministic ftz or non-ftz behavior, but the ftz sign mode must match the processor state to avoid UB. preserve-sign would be correct behavior for only setting ftz, but ftz and nsz typically come together.

Nondeterministic behavior might be okay in at least some applications.

@RalfJung
Copy link
Member

RalfJung commented Apr 9, 2025

What about building the standard library with that flag? Does the standard library include any code that would be penalized by it significantly, or even at all?

That seems very hard to say, so I'd be uncomfortable making this a stable guarantee. But it'd be for t-libs-api to decide.

@DemiMarie
Copy link
Contributor Author

Given the audio situation I think it would be better to allow compile-time constant-evaluation of subnormals, but without the requirement that it match the runtime behavior. Is there any non-contrieved situation where this would cause a performance penalty?

@RalfJung
Copy link
Member

RalfJung commented Apr 9, 2025

Given the audio situation I think it would be better to allow compile-time constant-evaluation of subnormals,

You mean constant folding? "compile-time evaluation" sounds like CTFE but I don't see how that would be relevant here.

If we want to allow const-folding of subnormals we have to specify the semantics as non-deterministically doing subnormal flushing or not. I don't know if/how LLVM can represent that.

Is there any non-contrieved situation where this would cause a performance penalty?

I'm the wrong person to answer that question. I know how to make a compiler correct, not how to make it generate fast code. ;)

It could cause correctness issues if code relying on no subnormal flushing calls standard library methods that then do subnormal flushing. So we probably couldn't use just plain non-determinism, we'd have to spell out conditions under which subnormal flushing is guaranteed not to occur.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Apr 9, 2025

Yeah, just declaring subnormals a non-deterministic free-for-all in the language sucks for code that does want them to work properly.

One ugly solution would be to lift the control register some ISAs have for this to the AM level (as thread-local state). This allows accounting for code that needs a specific mode as well as code that is fine with whatever the current mode is. Changing the mode could be fallible on some platforms where e.g. proper subnormal support would require switching from hard float to soft float. But that AM state opens up the same box of pandora for an optimizing compiler as any other deviation from “default fpenv everywhere, changing it is UB” does. Even ignoring the impact on constant folding etc., you have to start treating all floating point operations as depending on this global state, rather than being pure operations that can be scheduled freely. At least they wouldn’t have side effects in this case (in contrast to non-default exception handling), but LLVM is still poorly prepared for a language where all floating point math works that way.

I still have some hope that Rust will eventually be able to support non-default fpenvs in some way. It’ll require tricky language design decisions, but the blocker of good LLVM support will be resolved eventually, and at least the rounding mode portion is well motivated. Perhaps subnormals can piggy-back off that when it does happen. The challenges at the language level are as similar as those at the LLVM level.

@RalfJung
Copy link
Member

RalfJung commented Apr 9, 2025

But that AM state opens up the same box of pandora for an optimizing compiler as any other deviation from “default fpenv everywhere, changing it is UB” does.

If the AM state only switches between "guaranteed subnormal preservation" and "non-deterministically either preserve subnormals or flush them", then we can still always const-fold with subnormal preservation. So the only optimizations this affects are the ones that truly need an operation to be deterministic, e.g. scalar evolution. That could still be prohibitive though...

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Apr 9, 2025

That is an interesting idea but yeah I suspect it doesn’t change the calculus because you’d still have to avoid moving “guaranteed subnormals” operations into code regions where the other mode is enabled. That’s probably the biggest social and engineering challenge: migrating the IR and all code touching the IR away from “pure op that can be freely moved around subject only to SSA form’s defs-dominate-uses rule” and towards something like LLVM’s constrained intrinsics (or operand bundles on regular intrinsics) that can express such dependencies at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floating-point Area: Floating point numbers and arithmetic needs-triage This issue may need triage. Remove it if it has been sufficiently triaged.
Projects
None yet
Development

No branches or pull requests

9 participants