Skip to content

Conversation

lukewagner
Copy link
Member

This PR extends the concurrency support in Preview 3 with the ability for core wasm to create and switch between "cooperative threads", providing fiber-like functionality that is exposed to core wasm as plain core function imports that are integrated with and built on the async machinery that is already part of Preview 3.

Cooperative threads are meant to be released in some 0.3.x minor version after 0.3.0, so they are given a separate emoji-gate than 0.3.0's 🔀. Since cooperative threads pave the way for preemptive threads (which depend on shared-everything-threads), the emoji-gate for cooperative threads is 🧵 and preemptive threads are switched to 🧵②. The current emoji-gates 🚝 and 🚟 are folded into 🧵 since they're also post-0.3.0 and make sense only with cooperative threads but if there's still a reason to keep them separate, we can split them back out.

This feature is intended to cover both "host threads" (e.g., pthreads.h) and "green threads" (e.g. goroutines and Java virtual threads) use cases. The former use case mostly only needs the ability to spawn a thread (with, in a cooperative setting, the choice between whether to run the new thread immediately or "later"). But the green-threads use case needs to take explicit control of switching between threads (instead of leaving it up to the wasm runtime), so several additional built-ins are added to do this such as thread.switch-to (which works like the switch instruction in the stack-switching proposal). For the summary of the built-ins, see this section. Also see the rewritten goals and summary in Concurrency.md (formerly Async.md).

With the above new built-ins, thread.spawn_indirect becomes an optimized fusion of thread.new_indirect+thread.resume-later. For simplicity, thread.spawn_indirect is kept gated to 🧵②, so that it's "the one that depends on shared" and thread.new_indirect is added by 🧵 and has no shared option for now. As noted in the Binary format TODO, as part of 1.0-rc, we'll need to add a shared? option to every existing built-in (so that they can all be called from preemptive threads) anyhow. But this PR does change the binary format for thread.spawn_indirect (to include an optional shared immediate) so CC @abrown on these changes in Binary.md. (It's fine to not implement them any time soon, but I think it's useful to see the sketch of how they'd eventually look.)

For consistency, the yield built-in is backwards-compatibly renamed thread.yield (keeping the opcode the same and keeping yield in the text format but marked "deprecated" until the transition is done).

What was previously called "context-local storage" in the explainer is now renamed to "thread-local storage" (since it's now literally stored per thread), but the context.{get,set} built-ins keep the same name since they still seem to make sense as names. As planned earlier, the static length of the thread-local storage array is bumped from "1" to "2" b/c now there's a good reason to store the "linear memory shadow stack pointer" alongside the "general TLS struct pointer".

Although a bunch of the explanatory prose changes in this PR, the actual functional addition to what was already necessary for 0.3.0 is surprisingly little; there's mostly just this new ability to directly thread.switch-to. Hopefully that remains true in the implementation.

I expect to keep this PR open for a while until someone with an implementor's hat on has a chance to look carefully, so no rush for folks to review. I'd suggest reading the diff of Concurrency.md first then Explainer.md then CanonicalABI.md.

Comment on lines -54 to -55
* 🚝: marking some builtins as `async`
* 🚟: using `async` with `canon lift` without `callback` (stackful lift)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you detail a bit more about folding these together? Reading over bytecodealliance/wasm-tools#2298 it's not clear to me we'll want to lump them all together. For example async-stackful feels to me best left split out still. Also these seem all more async-related rather than threading-related (e.g. may wish to keep the separate gate)

  • resource.drop async
  • subtask.cancel async
  • {stream,future}.cancel-{read,write} async

however changing these to threading seems fine:

  • thread.yield cancellable (used to be yield async)
  • waitable-set.poll cancellable
  • waitable-set.wait cancellable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly I was just trying to minimize the combinatorial state space and I'm definitely happy to split things out if we get close to releasing a 0.3.x and some of these built-ins are still unimplemented or unused by language toolchains. But to provide the rationale for why I think they will be:

  • while there is the pthread.h coop threads use case which is only needs sync or async callback, for the "green thread" use cases (goroutines, Java Virtual Threads, Kotlin coroutines) that need thread.switch, I believe they'll naturally want stackful async
  • when using stackful async, I expect it's natural to use async on resource.drop, subtask.cancel and {stream,future}.cancel-{read,write} by default

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd prefer to keep things separate where possible because we already suffer from under-testing major features. I find that lumping things together under one feature umbrella encourages very large changes which are rarely thoroughly tested. With split features I find it encourages smaller changes which are much easier to verify are tested for example.

Put another way I agree with your motivations and I agree with the eventual conclusion, but given our development so far I've found large features to be detrimental to the end quality of the final implementation. One example is bytecodealliance/wasmtime#11611 where it's just too easy to forget about something given the huge matrix of possible interactions with all the async features. We very frequently will discuss changes to a non-existent implementation (historically in the implementation of p3) which results in spec changes but we never write down tests for these changes to ensure the old behavior works plus the new cases.

Basically I'd like to keep these all separate for development concerns even if in the end we end up lumping everything under one umbrella. I also think they should be separate if the end result is unusable without everything implemented because keeping things separate doesn't change the net amount of work done in total but I think produces a higher quality result as implementations are validated/verified in chunks rather than all-at-once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making sure things are tested when added is certainly important. My expectation is that both the stackful-async and async built-ins will need to be tested b/c they would be used by the essential green thread use cases motivating cooperative threads. But if that turns out not to be the case or if we want to roll out cooperative threads itself in chunks, then happy to revisit in that context.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm saying is that empirically testing has not happend historically. There's a whole matrix of possibilities of just the async bits before this PR and only some are tested. We're still running into missing tests and blind spots in the implementation today. Personally I think this would have greatly benefitted from not frontloading the full matrix of all options for all our use cases and instead splitting things up piecemeal but that's all history now at this point.

What I'd like to do going forward is that we've already got buckets for some features and I don't want to proactively start making those buckets all one big bucket. We'd have to proactively take a bunch of existing feature-gated code and update it all to be under a new feature gate. Today it's easy enough to grep for a single feature gate and see what needs testing when that wants to be "stabilized" but this becomes much more difficult when it's one large feature. At that point everyone asks if it's ready and no one can keep it all in their heads and so we point at a green set of tests and shrug.

Again though I don't dispute that in the long run that we'll probably want all these features implemented. My point is that I think it's valuable for developing a quality implementation that these are kept separate. It's not the same person designing, implementing, testing, and reviewing all in one got, but lots of different people, and in that situation keep things in smaller chunks is easier to keep in your head.

Basically, personally, I don't want to wait to revisit this, I want to change this right now to keep features separate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants