-
Notifications
You must be signed in to change notification settings - Fork 309
Add handle value to Wave syntax #2352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for this! I'll cc @lann on this since you're likely interested in this too |
We don't technically need any new syntax for resources since they can always be inferred from the type; some options I've considered in the past:
I've never really come up with a really great reason to standardize on one of these over another; it might make the most sense to leave it up to each embedding to select a syntax that makes sense for it, basically delegating There are also some more fundamental problems with this feature which are a big part of why I didn't implement it previously:
|
Yes, but resource is a different kind of value, which we don't want to conflate with numbers or strings. For example, if a number 42 can be interpreted as either a number or a resource depending on the provided type, it can confuse the user and easy to make mistakes. By the same argument, it's probably better to have a syntax for both owned and borrowed resources, so that we can reason about the ownership by just looking at the WAVE value.
Agreed. In this PR, we just add an AST node with a
Yep, we leave that totally to the user of WAVE to decide how to derive a valid resource from the AST. |
|
My opinion here is that resources are only useful to represent in Wave if Wave has side-effects, i.e. statements whose results can be bound to name. A resource can only be created when it is part of a return value from a function - that can be a resource's constructor (constructors are just sugar for an import function) or any other call that returns one. Integer indicies are not the same things as names because they can be forged, as Lann points out above. Its a pretty big change for wave to grow to have statements and some sort of store that maps names to values. However, I think its one we need to make in order for wave to actually be useful with resources. |
|
In capability-based security terms, the integer indices proposed here are C-list indices. The scope of the C-list will depend on the embedding, which WAVE itself doesn't need to know anything about, but it's the C-list that prevents forging of arbitrary capabilities, not the syntax. It's the same way that the Canonical ABI works. There's a handle table (the C-list), and then the application code works with |
This comment was marked as resolved.
This comment was marked as resolved.
|
Thoughts on abstract syntax: I would like the option to serialize handles to (and from) arbitrary value syntax. The most visible use for this today would be wasi-http We should consider - without necessarily immediately implementing - how to represent new component model handle types like I think we could define this new syntax as something like "an implementation-specific encoding of handle types". Generic WAVE processing code would validate this production against the |
|
Thoughts on concrete syntax: I can see the case for new token(s) to help visually identify handles, especially in output. I'm not sure that there is much value in distinguishing between owned and borrowed handles at this level. Again, if there are use cases where this seems important it would be helpful to hear about them. After staring at my keyboard for a bit I think the suggestion of a prefixed Some examples, in the context of my last comment:
|
We are adding a record and replay feature for WIT component. This requires we record the resource handle in the record phase, and replay the exact same handle in the replay phase. Here is an example trace we captured: If we don't have a syntactic difference for owned and borrowed, it's hard to know what
I agree. We can probably define a callback for types we want to customize, and this is not specific for resource types. Even for If we want to consider callbacks for WAVE values in general, then the question becomes what a default representation for resource is. I'm fine to use just the handle |
Do you mean for people or programmatically? In either case I'm not sure that you'd get very far without looking at the component types anyway; surely for replay you'd need to know the resource type to manage the resource table? |
|
Mostly for people to sanity check on the recorded trace. The above trace is only correct if |
|
I suppose I am weakly against additional syntax for borrows given all the alternatives made available by supporting arbitrary values: That said, this is mostly based on generic opposition to complexity so if others feel this would be worthwhile I could probably be convinced. 🙂 |
First off - there is already an effort for component record reply in progress that @cfallin is leading. You should talk to him about that effort and align with it. Aside from it not being obvious what the type of any of those numbered resources are, we cannot tell from this trace where any of the numbered resources come from. It fundamentally doesn't make sense to have a As soon as you start specifying side effects, you are making a programming language. If you are going to do that, you need to do all of the programming language things: the syntax for statements, expressions, lexical bindings, scopes, how to destructure values, and so on and so on. If you're going to take that on, it should be given some really serious consideration holistically before there's debate about the particulars of what syntax to use for an own vs a borrow. If you're really up for specifying and implementing a language I'd recommend making an RFC first to get alignment on the broad principles and then forming a team to work on the many details. if this is just a means to record/replay I'd recommend exhausting other ways to solve whatever problem this is getting at first, because there's definitely no quick or easy answers here. |
FWIW, @chenyan2002 and I did have a call in June about this (along with @arjunr2 and @sunfishcode); our conclusion was that Yan wants to build something higher-level, with a user-readable (and potentially -writable) trace, whereas we wanted to build a low-level, low-overhead mechanism (which @arjunr2 subsequently did, via capturing the calls rendered into canonical-ABI-specific details, i.e. core Wasm types and writes into linear memories). In other words, it seemed like we had pretty different goals. FWIW, I do think Arjun's approach to record-replay is feasible as a foundation for reversible debugging, which we have RFC-level consensus on, while the text-based design here wouldn't necessarily be. I don't have an opinion on the specifics of this thread otherwise as I haven't read the rest of it in detail. |
|
As Chris said, both our goals and approaches are very different. We are doing this purely from the guest side, and I don't think reversible debugging is possible without host side support. But we find its use in other areas such as service chaining. Anyway, it's outside the scope of this PR.
I understand your point. If we want to make WAVE self-contained, we need to build a whole language to define the resource, and get the resource via its constructor or static functions. The handle id is an internal detail that shouldn't be exposed at the surface language, or the surface language needs a side table that manages the resources. However, there are use cases where WAVE is used for debugging purposes and the value only makes sense with respect to the underlying component and runtime, just like the My understanding with the |
|
Thank you for explaining your goals. I want to see WAVE be a self-contained language where a serialization can be interpreted on many different embeddings, and in particular I want to avoid WAVE becoming a language where a serialization has a different meaning on different embeddings. If you are trying to debug just one particular embedding where compatibility isn't a concern, and aren't up for the (big, difficult) task of making a language where compatibility is possible, then I don't think the answer to your problem is to build out more syntax in WAVE. Like the Debug representation, you just need something local to your embedding. |
|
I've had some conversations with folks and wanted to answer some commonly asked questions here. Multiple people suggested relaxing from just numbers like This syntax also gives serializers the flexibility to incorporate UUIDs, should that be needed. Next, the proposal here is not embedder-specific. It's implemented in plain components which run in any component engine, in any embedding. And it has the same meaning, everywhere, and over time. For example, the meaning of That raises the question: is this very narrow meaning useful? I see two sides to it: It's not useful for programming languages. At the same time, it's desirable to retain a distinct value language, even as we design programming languages. A value language is useful for trace logs, for debuggers, and for RPC protocols (eg. when it's desirable to have a text analog of a binary protocol), that are just recording values. These use cases don't want to to be evaluating expressions, maintaining scope trees of |
|
There's two pieces here that I'd personally like to sort out at least for myself, and I'm not sure they've been previously brought up (sorry if they were, just point me to somewhere else):
For (1) I suppose it could be said that such an invocation would "simply fail" in that the C-list form is a serialize-the-value-to-text feature and not a deserialize-a-value-from-text feature. For (2) though my hunch is that it's the current trajectory for eventual integration so it might be good to ensure there's a plan for that before committing to the syntax here? |
|
One additional wrinkle here that I haven't seen answered above is: how exactly can replay match up resource IDs embedded within returned composite values? To be concrete, looking at this snippet there seems to be an implied semantics for the "replay execution": whatever return value comes back from that invocation, take the returned resource and put it at index 1 in the C-list. But say we have a function that returns a struct that contains multiple resources: It seems we need to extend the semantics somehow to include destructuring. I think this is part of what folks are getting at above with "this is a programming language": we don't necessarily need anything fancy like control flow, but even a straight-line trace needs something like assignment and destructuring to make the flow of handles from one API call to the next explicit and not depend on some implicit underlying implementation-specific order in which they're added to the C-list, I think. |
|
To add another explicit distinction: I think the aspect of let-bindings that matters less (IMHO) is giving readable names to handles; that's useful if humans author this format but not essential, and with my compiler-brain inserted, these are "just virtual register numbers" or something like that. The part that matters more is writing down the explicit semantics of how returned handles are wired up to further invocations, including in the presence of deep data structures. |
|
Thanks for all the comments. I think the confusion comes from the difference between handle and resource. What we really need is a handle value, which can only reference an existing resource, instead of a resource value, which suggests creating new resources.
For replay, the handle value/label is only for documentation. There is no operational semantics. When we see a fresh handle label, the replay component simply creates a resource with the expected type (by looking at the WIT signature). And we assert that the created handle matches the recorded handle. If the trace was recorded from a valid execution, we know that the created handle will always match, assuming that the Wasm execution is deterministic. Note that in the replay, the resource state is lost. But since we recorded all the interactions with that resource, we don’t need the full state to get/mock the correct response. In this sense, the handle label is “read-only”. We can only document what actually happened in an execution. We cannot “write” a handle label that doesn’t exist before, which would invalidate the assertion.
The handle label is an opaque label which ties to the identity of the resource, and the label content doesn’t need to correspond to the underlying handle id. So while there is an evaluation order decided by the WAVE parser, the handle label stays the same, regardless of when the handle value is parsed or evaluated. For replay, the handle label is used as an assertion, rather than an “opcode” to decide the evaluation order.
So for
Correct. We cannot create a resource with a specified handle id, and for security purposes, it’s best to keep it that way. The WAVE value we proposed here is only for referencing an existing resource instance. If we are going to design a programming language to create resources, it’s best to do this on top of WAVE, and WAVE can stay as a pure value language for WIT. |
|
Personally I'm having a hard time envisioning anything here. I don't quite feel that the handle/resource distinction is accurate myself, but I'm realizing that I don't really have any idea at all how this is going to integrate elsewhere. Would you be up for prototyping what integration in Wasmtime might look like, for example? That might help make some of the discussion around this more concrete. |
#42,#[42]ValueTypedtrait publicToValueandToRusttrait to allow conversion between Wave value and Rust valuesFromtrait to convert Rust values to Wave value, but it consumes the Rust value. Should we deprecateFromand useToValueinstead?TODO:
Fromtrait&42or'42for borrow, or do we need a borrow value since it can be inferred from the type