- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13.9k
 
Clamp Function for f32 and f64 #100556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clamp Function for f32 and f64 #100556
Conversation
      
        
              This comment was marked as resolved.
        
        
      
    
  This comment was marked as resolved.
| 
           Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) soon. Please see the contribution instructions for more information.  | 
    
| 
           These are very susceptible to getting worse for different implementations, since they were defined in the RFC very intentionally to be able to be implemented as  But I think we don't have a test for that right now. So could you please add a https://github.com/rust-lang/rust/tree/master/src/test/assembly test for it? I think this should work: // Floating-point clamp is designed to be implementable as max+min,
// so check to make sure that's what it's actually emitting.
// assembly-output: emit-asm
// compile-flags: -O -C "llvm-args=-x86-asm-syntax=intel"
// only-x86_64
// CHECK-LABEL: clamp_demo:
#[no_mangle]
pub fn clamp_demo(a: f32, x: f32, y: f32) -> f32 {
    // CHECK: maxss
    // CHECK: minss
    a.clamp(x, y)
}
// CHECK-LABEL: clamp12_demo:
#[no_mangle]
pub fn clamp12_demo(a: f32) -> f32 {
    // CHECK-NEXT: movss   xmm1
    // CHECK-NEXT: maxss   xmm1, xmm0
    // CHECK-NEXT: movss   xmm0
    // CHECK-NEXT: minss   xmm0, xmm1
    // CHECK-NEXT: ret
    a.clamp(1.0, 2.0)
}Then if the new implementation of the method still passes (or produces similarly good ASM), then I'd be happy to use it! @rustbot author  | 
    
| 
           I checked the codegen in godbolt pub fn clamp_old(this: f32, min: f32, max: f32) -> f32 {
    assert!(min <= max);
    let mut x = this;
    if x < min {
        x = min;
    }
    if x > max {
        x = max;
    }
    x
}
pub fn clamp_new(this: f32, min: f32, max: f32) -> f32 {
    assert!(min <= max);
    if this <= min {
        return min;
    }
    if this > max {
        return max;
    }
    this
}example::clamp_old:
        ucomiss xmm2, xmm1
        jb      .LBB0_2
        maxss   xmm1, xmm0
        minss   xmm2, xmm1
        movaps  xmm0, xmm2
        ret
.LBB0_2:
        push    rax
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_2]
        mov     esi, 28
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2
example::clamp_new:
        ucomiss xmm2, xmm1
        jb      .LBB1_2
        minss   xmm2, xmm0
        cmpnless        xmm0, xmm1
        andps   xmm2, xmm0
        andnps  xmm0, xmm1
        orps    xmm0, xmm2
        ret
.LBB1_2:
        push    rax
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_3]
        mov     esi, 28
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2 | 
    
          
 Thanks, this is what I was concerned about. For example, on Zen2 https://www.agner.org/optimize/instruction_tables.pdf lists  Targeting that architecture specifically doesn't fix it either: https://godbolt.org/z/ET5sEff8q.  That can  (In general, down at the single instruction level, trying to branch to skip things is often slower than just running them.) So I think you could update this to use a   | 
    
| 
           (closing and immediately reopening to try to re-kick the PR build, since only 1 of the three jobs ran.)  | 
    
| 
           @scottmcm  Thank you for your advice! I attempted to change the clamp function using  example::clamp_old:
        ucomiss xmm2, xmm1
        jb      .LBB0_2
        maxss   xmm1, xmm0
        minss   xmm2, xmm1
        movaps  xmm0, xmm2
        ret
.LBB0_2:
        push    rax
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_2]
        mov     esi, 28
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2
example::clamp_new:
        ucomiss xmm2, xmm1
        jb      .LBB1_5
        ucomiss xmm1, xmm0
        jae     .LBB1_4
        ucomiss xmm0, xmm2
        movaps  xmm1, xmm0
        jbe     .LBB1_4
        movaps  xmm1, xmm2
.LBB1_4:
        movaps  xmm0, xmm1
        ret
.LBB1_5:
        push    rax
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_3]
        mov     esi, 28
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2This new clamp also has slightly different functionality than the original one, since I think it mutates the original value? and returns a float, as opposed to just returning a new value. fn clamp(mut self, min: f32, max: f32) -> f32 {
    assert!(min <= max);
    if self <= min {
        self = min;
    } else if self > max {
        self = max;
    }
    self
} | 
    
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully just the one more change and things'll be good 🤞
If you could also squash all the commits, I'd appreciate it.
Simple Clamp Function I thought this was more robust and easier to read. I also allowed this function to return early in order to skip the extra bound check (I'm sure the difference is negligible). I'm not sure if there was a reason for binding `self` to `x`; if so, please correct me. Simple Clamp Function for f64 I thought this was more robust and easier to read. I also allowed this function to return early in order to skip the extra bound check (I'm sure the difference is negligible). I'm not sure if there was a reason for binding `self` to `x`; if so, please correct me. Floating point clamp test f32 clamp using mut self f64 clamp using mut self Update library/core/src/num/f32.rs Update f64.rs Update x86_64-floating-point-clamp.rs Update src/test/assembly/x86_64-floating-point-clamp.rs Update x86_64-floating-point-clamp.rs Co-Authored-By: scottmcm <[email protected]>
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
Co-authored-by: scottmcm <[email protected]>
| 
           Huzzah! The PR build finally stopped resisting. Thanks for your persistence here -- it's great to get an assembly test for this to record the work from the RFC in a way that's actually checked. @bors r+  | 
    
| 
           @bors r- failed in a rollup #100664 (comment)  | 
    
| 
           @bors r+  | 
    
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
…iaskrgr Rollup of 11 pull requests Successful merges: - rust-lang#100556 (Clamp Function for f32 and f64) - rust-lang#100663 (Make slice::reverse const) - rust-lang#100697 ( Minor syntax and formatting update to doc comment on `find_vtable_types_for_unsizing`) - rust-lang#100760 (update test for LLVM change) - rust-lang#100761 (some general mir typeck cleanup) - rust-lang#100775 (rustdoc: Merge source code pages HTML elements together v2) - rust-lang#100813 (Add `/build-rust-analyzer/` to .gitignore) - rust-lang#100821 (Make some docs nicer wrt pointer offsets) - rust-lang#100822 (Replace most uses of `pointer::offset` with `add` and `sub`) - rust-lang#100839 (Make doc for stdin field of process consistent) - rust-lang#100842 (Add diagnostics lints to `rustc_transmute` module (zero diags)) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks.
If there was a reason for binding
selftoxor if this code is incorrect, please correct me :)