Remove `i64.mul128`, add `i64.mul_wide_{s,u}` #13

alexcrichton · 2024-09-05T03:32:59Z

Some recent benchmarking had a surprising result I wasn't trying to dig for. Notably as summarized in #11 some more use cases of widening multiplication being more optimal than 128-by-128 bit multiplication have started to arise. Coupled with local benchmarking confirming that both on x64 and aarch64 that widening multiplication has more support in LLVM for more optimal lowerings and was easier to implement in Wasmtime than the 128-by-128 bit multiplication once various optimizations were implemented.

In the end i64.mul128, which was primarily motivated by "feels cleaner" and "should have the same performance" as widening multiplication, does not appear to have the expected performance/implementation tradeoff. Getting an as-performant i64.mul128 instruction relative to i64.mul_wide_{s,u} has required more work than expected and so the balance of concerns has me now tipping away from i64.mul128, despite it being "less clean" compared to the add/sub opcodes proposed in this PR.

Closes #11

Some recent [benchmarking] had a surprising result I wasn't trying to dig for. Notably as summarized in #11 some more use cases of widening multiplication being more optimal than 128-by-128 bit multiplication have started to arise. Coupled with local benchmarking confirming that both on x64 and aarch64 that widening multiplication has more support in LLVM for more optimal lowerings and was easier to implement in Wasmtime than the 128-by-128 bit multiplication once various optimizations were implemented. In the end `i64.mul128`, which was primarily motivated by "feels cleaner" and "should have the same performance" as widening multiplication, does not appear to have the expected performance/implementation tradeoff. Getting an as-performant `i64.mul128` instruction relative to `i64.mul_wide_{s,u}` has required more work than expected and so the balance of concerns has me now tipping away from `i64.mul128`, despite it being "less clean" compared to the add/sub opcodes proposed in this PR. Closes #11 [benchmarking]: #6 (comment)

alexcrichton mentioned this pull request Sep 5, 2024

Is mul128 going to be as fast as widening multiplication? #11

Closed

Add a link to an issue

07f8593

alexcrichton merged commit 91e8f31 into main Sep 10, 2024

alexcrichton deleted the remove-mul128 branch September 10, 2024 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove `i64.mul128`, add `i64.mul_wide_{s,u}` #13

Remove `i64.mul128`, add `i64.mul_wide_{s,u}` #13

Uh oh!

alexcrichton commented Sep 5, 2024

Uh oh!

Uh oh!

Remove i64.mul128, add i64.mul_wide_{s,u} #13

Remove i64.mul128, add i64.mul_wide_{s,u} #13

Uh oh!

Conversation

alexcrichton commented Sep 5, 2024

Uh oh!

Uh oh!

Remove `i64.mul128`, add `i64.mul_wide_{s,u}` #13

Remove `i64.mul128`, add `i64.mul_wide_{s,u}` #13