-
Notifications
You must be signed in to change notification settings - Fork 95
Add built-ins for creating/switching cooperative threads #557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* 🚝: marking some builtins as `async` | ||
* 🚟: using `async` with `canon lift` without `callback` (stackful lift) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you detail a bit more about folding these together? Reading over bytecodealliance/wasm-tools#2298 it's not clear to me we'll want to lump them all together. For example async-stackful feels to me best left split out still. Also these seem all more async-related rather than threading-related (e.g. may wish to keep the separate gate)
resource.drop async
subtask.cancel async
{stream,future}.cancel-{read,write} async
however changing these to threading seems fine:
thread.yield cancellable
(used to beyield async
)waitable-set.poll cancellable
waitable-set.wait cancellable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly I was just trying to minimize the combinatorial state space and I'm definitely happy to split things out if we get close to releasing a 0.3.x and some of these built-ins are still unimplemented or unused by language toolchains. But to provide the rationale for why I think they will be:
- while there is the pthread.h coop threads use case which is only needs
sync
orasync callback
, for the "green thread" use cases (goroutines, Java Virtual Threads, Kotlin coroutines) that needthread.switch
, I believe they'll naturally want stackfulasync
- when using stackful
async
, I expect it's natural to useasync
onresource.drop
,subtask.cancel
and{stream,future}.cancel-{read,write}
by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd prefer to keep things separate where possible because we already suffer from under-testing major features. I find that lumping things together under one feature umbrella encourages very large changes which are rarely thoroughly tested. With split features I find it encourages smaller changes which are much easier to verify are tested for example.
Put another way I agree with your motivations and I agree with the eventual conclusion, but given our development so far I've found large features to be detrimental to the end quality of the final implementation. One example is bytecodealliance/wasmtime#11611 where it's just too easy to forget about something given the huge matrix of possible interactions with all the async features. We very frequently will discuss changes to a non-existent implementation (historically in the implementation of p3) which results in spec changes but we never write down tests for these changes to ensure the old behavior works plus the new cases.
Basically I'd like to keep these all separate for development concerns even if in the end we end up lumping everything under one umbrella. I also think they should be separate if the end result is unusable without everything implemented because keeping things separate doesn't change the net amount of work done in total but I think produces a higher quality result as implementations are validated/verified in chunks rather than all-at-once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making sure things are tested when added is certainly important. My expectation is that both the stackful-async and async
built-ins will need to be tested b/c they would be used by the essential green thread use cases motivating cooperative threads. But if that turns out not to be the case or if we want to roll out cooperative threads itself in chunks, then happy to revisit in that context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I'm saying is that empirically testing has not happend historically. There's a whole matrix of possibilities of just the async bits before this PR and only some are tested. We're still running into missing tests and blind spots in the implementation today. Personally I think this would have greatly benefitted from not frontloading the full matrix of all options for all our use cases and instead splitting things up piecemeal but that's all history now at this point.
What I'd like to do going forward is that we've already got buckets for some features and I don't want to proactively start making those buckets all one big bucket. We'd have to proactively take a bunch of existing feature-gated code and update it all to be under a new feature gate. Today it's easy enough to grep for a single feature gate and see what needs testing when that wants to be "stabilized" but this becomes much more difficult when it's one large feature. At that point everyone asks if it's ready and no one can keep it all in their heads and so we point at a green set of tests and shrug.
Again though I don't dispute that in the long run that we'll probably want all these features implemented. My point is that I think it's valuable for developing a quality implementation that these are kept separate. It's not the same person designing, implementing, testing, and reviewing all in one got, but lots of different people, and in that situation keep things in smaller chunks is easier to keep in your head.
Basically, personally, I don't want to wait to revisit this, I want to change this right now to keep features separate.
This PR extends the concurrency support in Preview 3 with the ability for core wasm to create and switch between "cooperative threads", providing fiber-like functionality that is exposed to core wasm as plain core function imports that are integrated with and built on the async machinery that is already part of Preview 3.
Cooperative threads are meant to be released in some 0.3.x minor version after 0.3.0, so they are given a separate emoji-gate than 0.3.0's 🔀. Since cooperative threads pave the way for preemptive threads (which depend on shared-everything-threads), the emoji-gate for cooperative threads is 🧵 and preemptive threads are switched to 🧵②. The current emoji-gates 🚝 and 🚟 are folded into 🧵 since they're also post-0.3.0 and make sense only with cooperative threads but if there's still a reason to keep them separate, we can split them back out.
This feature is intended to cover both "host threads" (e.g., pthreads.h) and "green threads" (e.g. goroutines and Java virtual threads) use cases. The former use case mostly only needs the ability to spawn a thread (with, in a cooperative setting, the choice between whether to run the new thread immediately or "later"). But the green-threads use case needs to take explicit control of switching between threads (instead of leaving it up to the wasm runtime), so several additional built-ins are added to do this such as
thread.switch-to
(which works like theswitch
instruction in the stack-switching proposal). For the summary of the built-ins, see this section. Also see the rewritten goals and summary inConcurrency.md
(formerlyAsync.md
).With the above new built-ins,
thread.spawn_indirect
becomes an optimized fusion ofthread.new_indirect
+thread.resume-later
. For simplicity,thread.spawn_indirect
is kept gated to 🧵②, so that it's "the one that depends onshared
" andthread.new_indirect
is added by 🧵 and has noshared
option for now. As noted in the Binary format TODO, as part of 1.0-rc, we'll need to add ashared?
option to every existing built-in (so that they can all be called from preemptive threads) anyhow. But this PR does change the binary format forthread.spawn_indirect
(to include an optionalshared
immediate) so CC @abrown on these changes in Binary.md. (It's fine to not implement them any time soon, but I think it's useful to see the sketch of how they'd eventually look.)For consistency, the
yield
built-in is backwards-compatibly renamedthread.yield
(keeping the opcode the same and keepingyield
in the text format but marked "deprecated" until the transition is done).What was previously called "context-local storage" in the explainer is now renamed to "thread-local storage" (since it's now literally stored per thread), but the
context.{get,set}
built-ins keep the same name since they still seem to make sense as names. As planned earlier, the static length of the thread-local storage array is bumped from "1" to "2" b/c now there's a good reason to store the "linear memory shadow stack pointer" alongside the "general TLS struct pointer".Although a bunch of the explanatory prose changes in this PR, the actual functional addition to what was already necessary for 0.3.0 is surprisingly little; there's mostly just this new ability to directly
thread.switch-to
. Hopefully that remains true in the implementation.I expect to keep this PR open for a while until someone with an implementor's hat on has a chance to look carefully, so no rush for folks to review. I'd suggest reading the diff of
Concurrency.md
first thenExplainer.md
thenCanonicalABI.md
.