Skip to content

Moving WebAssembly inline assembly forward #136382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tgross35 opened this issue Feb 1, 2025 · 16 comments
Open

Moving WebAssembly inline assembly forward #136382

tgross35 opened this issue Feb 1, 2025 · 16 comments
Labels
A-inline-assembly Area: Inline assembly (`asm!(…)`) C-discussion Category: Discussion or questions that doesn't represent real issues. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@tgross35
Copy link
Contributor

tgross35 commented Feb 1, 2025

I'm opening this issue to consolidate some scattered discussion about what is needed to make inline assembly for Wasm work. This is tracked along with the other architectures at #93335. That thread notes:

  • It must have clobber_abi.
  • It must be possible to clobber every register that is normally clobbered by a function call.
  • Generally review that the exposed register classes make sense.

So, for Wasm specifically:

  1. Syntax seems to be the biggest question (see below discussion), WAT vs. LLVM's format
  2. There are not any registers to be clobbered. Should there be some way to indicate the asm clobbers the top N elements on the stack?
  3. Should we be restricting dir specs (in, out, lateout, inout, inlateout) or options (pure, nomem, readonly, preserves_flags, noreturn, nostack, raw)? We should probably reject preserves_flags. I'm not sure if lateout and inlateout make sense.
  4. Is the LLVM side generally considered stable? It appears unchanging, but I don't see any of its constraints documented in langref https://llvm.org/docs/LangRef.html#inline-asm-constraint-string

Cc @daxpedda @hanna-kruppe @alexcrichton @hoodmane @solomatov, I think you have all been involved in the wasm-inline-asm discussion in different places.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 1, 2025
@tgross35 tgross35 added A-inline-assembly Area: Inline assembly (`asm!(…)`) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Feb 1, 2025
@hanna-kruppe
Copy link
Contributor

My impression is that LLVM's support for wasm inline assembly is not comparable to support for other architecture:

  1. There's no GCC equivalent that Clang would presumably try and be interoperable with, so much less inertia against changing how exactly it works. I can't even find any information anywhere how it's supposed to work.
  2. The support that does exist in LLVM is barely tested, and only one a single test even tries to look like legit code instead of a made-up smoke test. It seems to be a side effect of enabling support for non-inline assembly syntax (https://reviews.llvm.org/D52914 - I've not seen any work since then that's specific to inline asm).
  3. Searching for usage in some repositories where I'd expect to see usage of it reveals close to zero usage and a maintainer saying "should probably be considered experimental" in 2021 and expressing a similar sentiment again in 2023. Instead, most of what you'd expect a libc to use inline asm for is either done with wasm-specific intrinsics or written in separate non-inline assembly.
  4. Insofar the inline asm support is stable, that seems to be due to not getting any ongoing development. For example, trying to pass simd128 vectors to inline asm just crashes.

So I would not recommend moving forward with stabilizing Wasm inline assembly without first getting a vibe check from the LLVM backend maintainers. Even if Rust people wanted to put in the work of fixing the bugs that affect Rust users, is there even enough upstream interest and maintainer bandwidth to review and merge that work?

@bjorn3
Copy link
Member

bjorn3 commented Feb 1, 2025

There's no GCC equivalent that Clang would presumably try and be interoperable with, so much less inertia against changing how exactly it works. I can't even find any information anywhere how it's supposed to work.

It's even worse than there being no GCC equivalent. There is a standard text format specified for wasm, but LLVM created an entirely different format for whatever reason which no other tool in existence works with.

@tgross35
Copy link
Contributor Author

tgross35 commented Feb 1, 2025

Sounds like this is nowhere close then. Our flags could probably still be improved but I guess that is about it.

It's even worse than there being no GCC equivalent. There is a standard text format specified for wasm, but LLVM created an entirely different format for whatever reason which no other tool in existence works with.

Does LLVM provide equivalents for control flow or is it just a strict subset of WAT without S-expressions?

We could very nearly handle wasm text->binary ourselves at least as well as LLVM could, if that wouldn't also require us doing the validation...

@hoodmane
Copy link
Contributor

hoodmane commented Feb 1, 2025

Well the one advantage of the llvm format is generating relocations. I think the webassembly text format is missing some information needed to emit relocations and that's why llvm invented its own format. It seems like it could have been kept more similar to wat though. Cc @sbc100 @kripken.

@tgross35
Copy link
Contributor Author

tgross35 commented Feb 1, 2025

If both are worth having then in theory there could be a watdialect flag (which we could then enable by default), similar to inteldialect.

@sbc100
Copy link

sbc100 commented Feb 1, 2025

The LLVM text format was chosen to match the existing assembly format support in LLVM. It reuses a lot of the existing infrastructure. It doesn't support any kind of nesting and looks more like the unfolded/linear form of wat, but that is largely because that is what llvm expects from an asm format. Doing this differently would be very hard I think.

I don't think there are any plans to replace this format in LLVM and its use is fairly widespread now, admittedly a lot of it is in form of non-inline separate .s files. While I would support added the ability for wat to express things like relocations, I don't think it would make sense to convert LLVM to use some new format like that, and I don't think it makes much sense to have different format for inline and non-inline asm, or another/different format for inline asm in rust vs C/C++. In other words, I think the current inline asm support is worth improving, but I not changing radically.

@bjorn3
Copy link
Member

bjorn3 commented Feb 1, 2025

Is there a formal definition of the text format LLVM uses? There is for the official wasm text format.

How hard would it be for rustc to convert the official text format (+ annotations for things like relocations) to the format LLVM uses? If we get another backend which supports wasm, I think it is unlikely that this other backend will exactly mirror LLVM's text format. Using the official text format is more likely and even if it doesn't, it should be a lot easier to convert a text format with a robust definition and an existing parser written in rust (wasmparser) to whichever format said backend uses than it would be to parse LLVM's custom text format. And if the other backend uses LLVM's custom text format after all it can reuse the same conversion code as we use for the LLVM backend.

@sbc100
Copy link

sbc100 commented Feb 1, 2025

Is there a formal definition of the text format LLVM uses?

No, there is no formal definition that I know of. However, I don't think there are formal definitions of other assembly languages either. For example from the x86_64 asm docs https://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf: "There is no standard assembly language for the x86 architecture."

@bjorn3
Copy link
Member

bjorn3 commented Feb 1, 2025

Unlike for x86 there is actually a standard text format for wasm, so there is much less of an excuse to use a poorly specified format. And I believe for arm at least the format for individual instructions is specified by Arm Limited, but I'm not sure if assembler directives are also specified by them.

@hanna-kruppe
Copy link
Contributor

I think the important part is that the syntax including important directives realistically won't change in backwards incompatible ways. @sbc100 has said that there's no plans to replace this format and it has a fair amount of usage in separate .s files, and I believe that. But I also believe that a change in that syntax is still somewhat more likely than for any inline assembly dialect currently exposed in stable Rust: those probably have several orders of magnitude more existing code, split between far more stakeholders, than LLVM's wasm syntax. I'm not sure if WAT is much better in this regard, especially if it has to be extended in some yet-undecided form to properly support relocations. But at least, as @bjorn3 said, there's high quality WAT-as-of-today parsers written in Rust, so it might be easier for rustc to support the format indefinitely and translate it to whatever a future backend expects.

@tgross35
Copy link
Contributor Author

tgross35 commented Feb 2, 2025

I don't think there are any plans to replace this format in LLVM and its use is fairly widespread now, admittedly a lot of it is in form of non-inline separate .s files. While I would support added the ability for wat to express things like relocations, I don't think it would make sense to convert LLVM to use some new format like that, and I don't think it makes much sense to have different format for inline and non-inline asm, or another/different format for inline asm in rust vs C/C++. In other words, I think the current inline asm support is worth improving, but I not changing radically.

x86 currently allows using the different syntaxes in module asm via .intel_syntax or .att_syntax directives, or inteldialect for inline asm. It is understandable that LLVM would not want to replace its current syntax, but would extending existing syntax with a WAT flavor via directives be something worth investigating? We could do WAT to LLVM transpiling in rustc, but support in the backend would probably be a benefit to other frontends as well. And that would provide a deprecation path for the current syntax if LLVM decides to go that route.

Also, thanks for the insights here.

@sbc100
Copy link

sbc100 commented Feb 2, 2025

I think that there are bunch of technical reasons why switching LLVM to use something like wat is a lot more complicated that it might appear. The LLVM assembly / disassembly infrastructure is all driven by the MC layer where a there is a lot shared code/infrastructure. Its not only used for inline assembly as an input and output format for several tools such as llvm-mc.

Trying to use an assembly format that is different to all the other ones in LLVM would, I think, be a lot of work. The assembly formats used by LLVM are all based the idea of a linear stream of directives, labels and instructions. Trying to add an s-expression format like WAT would require a whole lot of new infrastructure and would prevent the WebAssembly sharing code with the other backends. It might not even be feasible at all. I think it would also result in the backend being harder to maintain for other LLVM developers.

@tgross35
Copy link
Contributor Author

tgross35 commented Feb 3, 2025

It makes sense that the LLVM syntax is pretty deeply ingrained, thanks for the clarification. Do you think we would run into any problems if we parsed WAT and converted it to LLVM's version on our end? (I am not sure whether your comment about s-expression vs linear applies specifically to thorough LLVM support or also a surface level transpilation like we would be attempting to do). If that is feasible then it seems advantageous for us, especially regarding the ability to support different backends.

Separately, I wonder if whatever we wind up figuring out might also provide a better flow for assembling/linking standalone .wat files.

@jieyouxu jieyouxu added the C-discussion Category: Discussion or questions that doesn't represent real issues. label Feb 3, 2025
@alexcrichton
Copy link
Member

Personally I've felt that inline asm in wasm has fallen into one of two categories: (1) accessing fancy instructions or (2) accessing constructs outside of the compilation model. For (1) most of the need there is satisfied with compiler intrinsics (e.g. simd or core::arch::wasm*) or such. For (2) what I'd be thinking of is things like adding more globals, a new table, or something like that.

To do (2) I believe it's required to work with the reloc.* sections that are specified in Linking.md which LLD implements. There is currently no connection to the annotations proposal mentioned and reloc.* sections. Doing so would also be likely significantly hard because reloc.* works with byte offsets in the code section and can reference relocations in the middle of instructions. This is possible to do with annotations but it would likely be nontrivial.

Personally I think it would be quite useful to use inline asm on wasm, I agree with the hesitation to stabilize exactly-what's-there as-is, and I don't think that a translation layer built into rustc is going to be all that simple. That being said I'd be happy to work/collaborate with folks on implementing/designing annotations for reloc.* since that would be broadly useful outside the context of rustc (e.g. showing the text format of an object file for debugging).

@tgross35
Copy link
Contributor Author

tgross35 commented Feb 3, 2025

@bjorn3 did you have any loose idea about what relocations would look like in annotations?

@bjorn3
Copy link
Member

bjorn3 commented Feb 14, 2025

(iconst.i32 0) with a relocation for the function index of my_symbol as value to push on the stack could be written like (iconst.i32 (@reloc R_WASM_FUNCTION_INDEX_LEB "my_symbol") 0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-inline-assembly Area: Inline assembly (`asm!(…)`) C-discussion Category: Discussion or questions that doesn't represent real issues. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants