Skip to content

Conversation

@chenyan2002
Copy link
Contributor

@chenyan2002 chenyan2002 commented Oct 14, 2025

  • Add handle value to Wave syntax: #42, #[42]
  • Make ValueTyped trait public
  • Add ToValue and ToRust trait to allow conversion between Wave value and Rust values
  • We have the From trait to convert Rust values to Wave value, but it consumes the Rust value. Should we deprecate From and use ToValue instead?

TODO:

  • Change the AST node from resource to handle
  • Decide whether to deprecate From trait
  • Bike shedding resource syntax: &42 or '42 for borrow, or do we need a borrow value since it can be inferred from the type
  • missing docs
  • documentation and tests

@alexcrichton
Copy link
Member

Thanks for this! I'll cc @lann on this since you're likely interested in this too

@lann
Copy link
Contributor

lann commented Oct 17, 2025

Bike shedding resource syntax: &42 or '42 for borrow, or do we need a borrow value since it can be inferred from the type

We don't technically need any new syntax for resources since they can always be inferred from the type; some options I've considered in the past:

  • Plain number: 42
    • In Wasmtime resources are represented as u32s (aka rep)
  • Opaque string specific to the use case; e.g.:
    • Still just the resource's repr: "42"
    • Some random identifier: "58089f31-e423-4fa0-8f4b-04c55fdf1a6d"
  • One of the above wrapped in "variant case" syntax: my-resource(42)

I've never really come up with a really great reason to standardize on one of these over another; it might make the most sense to leave it up to each embedding to select a syntax that makes sense for it, basically delegating WasmValue::make_resource to some kind of callback.

There are also some more fundamental problems with this feature which are a big part of why I didn't implement it previously:

  • Resources are meant to be "unforgeable"; a component cannot do anything with a resource until it has explicitly received a handle to it. I think it's probably very difficult if not impossible to have one good universal way to handle that restriction in a serialization format. In some situations it might be fine to just trust a bare serialized rep (42) while in others you might really want to e.g. cryptographically sign the rep to ensure it isn't tampered with before it gets back to you.
  • Some resources might represent "plain old data" that could be serialized directly like headers([("content-type", "text/html")]) while others might have no useful serialization at all (tcp-socket?).

@chenyan2002
Copy link
Contributor Author

We don't technically need any new syntax for resources since they can always be inferred from the type

Yes, but resource is a different kind of value, which we don't want to conflate with numbers or strings. For example, if a number 42 can be interpreted as either a number or a resource depending on the provided type, it can confuse the user and easy to make mistakes. By the same argument, it's probably better to have a syntax for both owned and borrowed resources, so that we can reason about the ownership by just looking at the WAVE value.

I've never really come up with a really great reason to standardize on one of these over another; it might make the most sense to leave it up to each embedding to select a syntax that makes sense for it, basically delegating WasmValue::make_resource to some kind of callback.

Agreed. In this PR, we just add an AST node with a u32 to represent a resource. It doesn't standardize what the number means. It can be a handle id, rep, or some cryptographically signed number (in which case, we may want to extend the number to u128 or u256). It's up to the caller to decide how to interpret this number for their applications. WasmValue::make_resource only takes a u32 and put it into an AST node. The embedder can call unwrap_resource to get the u32 and decide how to construct that resource, or throw an error when that number is not a valid resource representation.

Resources are meant to be "unforgeable"; a component cannot do anything with a resource until it has explicitly received a handle to it. I think it's probably very difficult if not impossible to have one good universal way to handle that restriction in a serialization format.

Yep, we leave that totally to the user of WAVE to decide how to derive a valid resource from the AST.

@pchickey
Copy link
Contributor

My opinion here is that resources are only useful to represent in Wave if Wave has side-effects, i.e. statements whose results can be bound to name. A resource can only be created when it is part of a return value from a function - that can be a resource's constructor (constructors are just sugar for an import function) or any other call that returns one.

Integer indicies are not the same things as names because they can be forged, as Lann points out above.

Its a pretty big change for wave to grow to have statements and some sort of store that maps names to values. However, I think its one we need to make in order for wave to actually be useful with resources.

@sunfishcode
Copy link
Member

sunfishcode commented Oct 17, 2025

In capability-based security terms, the integer indices proposed here are C-list indices. The scope of the C-list will depend on the embedding, which WAVE itself doesn't need to know anything about, but it's the C-list that prevents forging of arbitrary capabilities, not the syntax.

It's the same way that the Canonical ABI works. There's a handle table (the C-list), and then the application code works with u32 indices. In fact, one of the uses for this WAVE syntax is for debugging scenarios, where we want to be able to correlate syntax lke #42 to the handle represented as 42 inside a component.

@lann

This comment was marked as resolved.

@lann
Copy link
Contributor

lann commented Oct 20, 2025

Thoughts on abstract syntax:

I would like the option to serialize handles to (and from) arbitrary value syntax. The most visible use for this today would be wasi-http fields; for most purposes fields is really a value type in a trench coat and could be serialized with WAVE's record syntax (with a list-of-tuples fallback for "weird" header names) or - better yet - a future map syntax.

We should consider - without necessarily immediately implementing - how to represent new component model handle types like future and stream.

I think we could define this new syntax as something like "an implementation-specific encoding of handle types". Generic WAVE processing code would validate this production against the value grammar but otherwise treat it as an opaque sequence of tokens. Some implementations - including perhaps the one initial implementation - might only support encoding handles, which would allow them to just output whatever is convenient for debugging.

@lann
Copy link
Contributor

lann commented Oct 20, 2025

Thoughts on concrete syntax:

I can see the case for new token(s) to help visually identify handles, especially in output. I'm not sure that there is much value in distinguishing between owned and borrowed handles at this level. Again, if there are use cases where this seems important it would be helpful to hear about them.

After staring at my keyboard for a bit I think the suggestion of a prefixed # makes sense to me. In some ways * or & might be more "familiar" for handles but I think that's not necessarily a good thing here where it could pull along semantic baggage from C (et al.). For the sake of argument it could also make sense to use matched delimiters like <42> but I personally prefer the prefix #.

Some examples, in the context of my last comment:

  • Table references: #42, #"0x2a", #handle(42)
  • Pseudo-value: #{content-type: "application/wasm"}, #fields([("a", "b")])
  • New types: #future(42), #stream((42, u8))
  • Debug: #{ty: 7, rep: 42, owned: false}, #{kind: stream, payload-type: u8, state: open}

@chenyan2002
Copy link
Contributor Author

could you say more about what you're looking to do with this feature?

We are adding a record and replay feature for WIT component. This requires we record the resource handle in the record phase, and replay the exact same handle in the replay phase. Here is an example trace we captured:

fastly:compute/http-incoming.handle(#1, #2)
fastly:compute/http-req/request.get-method(#[1], 1024)
fastly:compute/http-req.close(#1)
fastly:compute/http-body.new()
ret: ok(#1)
fastly:compute/http-body.write(#[1], …)

If we don't have a syntactic difference for owned and borrowed, it's hard to know what #1 refers to without looking at the underlying WIT type.

I think we could define this new syntax as something like "an implementation-specific encoding of handle types"

I agree. We can probably define a callback for types we want to customize, and this is not specific for resource types. Even for list<u8>, we may want to take different format as input, e.g., a text string when u8 all falls in the ascii alphabetic range, a hex string, or even a file url.

If we want to consider callbacks for WAVE values in general, then the question becomes what a default representation for resource is. I'm fine to use just the handle #42 as the default syntax, and we can use the callbacks to handle borrowed handle. Or we can make borrowed handle as part of the default syntax, if we think the above example is generic enough.

@lann
Copy link
Contributor

lann commented Oct 20, 2025

If we don't have a syntactic difference for owned and borrowed, it's hard to know what #1 refers to without looking at the underlying WIT type.

Do you mean for people or programmatically? In either case I'm not sure that you'd get very far without looking at the component types anyway; surely for replay you'd need to know the resource type to manage the resource table?

@chenyan2002
Copy link
Contributor Author

Mostly for people to sanity check on the recorded trace.

fastly:compute/http-req.close(#1)
fastly:compute/http-body.new()
ret: ok(#1)

The above trace is only correct if close takes an owned resource. Otherwise, http-body.new() would return a handle other than #1. If close takes a borrowed resource, and http-body.new() still returns #1, it means we have a bug in the component linking code.

@lann
Copy link
Contributor

lann commented Oct 20, 2025

I suppose I am weakly against additional syntax for borrows given all the alternatives made available by supporting arbitrary values: #[42], #(42), #"&42", #borrow(42), #-42, #b42

That said, this is mostly based on generic opposition to complexity so if others feel this would be worthwhile I could probably be convinced. 🙂

@pchickey
Copy link
Contributor

pchickey commented Oct 20, 2025

We are adding a record and replay feature for WIT component. This requires we record the resource handle in the record phase, and replay the exact same handle in the replay phase. Here is an example trace we captured:

fastly:compute/http-incoming.handle(#1, #2)
fastly:compute/http-req/request.get-method(#[1], 1024)
fastly:compute/http-req.close(#1)
fastly:compute/http-body.new()
ret: ok(#1)
fastly:compute/http-body.write(#[1], …)

First off - there is already an effort for component record reply in progress that @cfallin is leading. You should talk to him about that effort and align with it.

Aside from it not being obvious what the type of any of those numbered resources are, we cannot tell from this trace where any of the numbered resources come from. It fundamentally doesn't make sense to have a #1, #2 in the first statement because its impossible to tell whether those values existed prior to that statement. We also can't tell if any of those statements are returning new values that are being bound to new integers.

As soon as you start specifying side effects, you are making a programming language. If you are going to do that, you need to do all of the programming language things: the syntax for statements, expressions, lexical bindings, scopes, how to destructure values, and so on and so on. If you're going to take that on, it should be given some really serious consideration holistically before there's debate about the particulars of what syntax to use for an own vs a borrow. If you're really up for specifying and implementing a language I'd recommend making an RFC first to get alignment on the broad principles and then forming a team to work on the many details. if this is just a means to record/replay I'd recommend exhausting other ways to solve whatever problem this is getting at first, because there's definitely no quick or easy answers here.

@cfallin
Copy link
Member

cfallin commented Oct 20, 2025

First off - there is already an effort for component record reply in progress that @cfallin is leading. You should talk to him about that effort and align with it.

FWIW, @chenyan2002 and I did have a call in June about this (along with @arjunr2 and @sunfishcode); our conclusion was that Yan wants to build something higher-level, with a user-readable (and potentially -writable) trace, whereas we wanted to build a low-level, low-overhead mechanism (which @arjunr2 subsequently did, via capturing the calls rendered into canonical-ABI-specific details, i.e. core Wasm types and writes into linear memories). In other words, it seemed like we had pretty different goals. FWIW, I do think Arjun's approach to record-replay is feasible as a foundation for reversible debugging, which we have RFC-level consensus on, while the text-based design here wouldn't necessarily be. I don't have an opinion on the specifics of this thread otherwise as I haven't read the rest of it in detail.

@chenyan2002
Copy link
Contributor Author

chenyan2002 commented Oct 21, 2025

As Chris said, both our goals and approaches are very different. We are doing this purely from the guest side, and I don't think reversible debugging is possible without host side support. But we find its use in other areas such as service chaining. Anyway, it's outside the scope of this PR.

It fundamentally doesn't make sense to have a #1, #2 in the first statement because its impossible to tell whether those values existed prior to that statement. We also can't tell if any of those statements are returning new values that are being bound to new integers.

I understand your point. If we want to make WAVE self-contained, we need to build a whole language to define the resource, and get the resource via its constructor or static functions. The handle id is an internal detail that shouldn't be exposed at the surface language, or the surface language needs a side table that manages the resources. However, there are use cases where WAVE is used for debugging purposes and the value only makes sense with respect to the underlying component and runtime, just like the Debug trait defined for each resource from wit-bindgen.

My understanding with the wasm-wave crate is that it defines an AST for WIT values. The deserialization only concerns about how to parse the string into the AST node, instead of the actual resource, which contains closures. So the AST representation should mimic the structure we see from wit-bindgen. The embedder is then responsible to materialize (or invalidate) this AST node into a real resource. In this sense, handle id is a reasonable representation for resource AST node.

@pchickey
Copy link
Contributor

Thank you for explaining your goals. I want to see WAVE be a self-contained language where a serialization can be interpreted on many different embeddings, and in particular I want to avoid WAVE becoming a language where a serialization has a different meaning on different embeddings.

If you are trying to debug just one particular embedding where compatibility isn't a concern, and aren't up for the (big, difficult) task of making a language where compatibility is possible, then I don't think the answer to your problem is to build out more syntax in WAVE. Like the Debug representation, you just need something local to your embedding.

@sunfishcode
Copy link
Member

I've had some conversations with folks and wanted to answer some commonly asked questions here.

Multiple people suggested relaxing from just numbers like #42 to serializer-chosen labels, like #request-42 or #socket-43 or similar, where the labels carry no inherent semantic meaning. I think we can make that work, and I like how it improves human-readability, eg.:

fastly:compute/http-incoming.handle(#request-1, #body-2)

This syntax also gives serializers the flexibility to incorporate UUIDs, should that be needed.

Next, the proposal here is not embedder-specific. It's implemented in plain components which run in any component engine, in any embedding. And it has the same meaning, everywhere, and over time. For example, the meaning of #request-42 is always precisely: "handle with C-list key request-42". It may mean something more to a human doing debugging, or to a machine reading it in the context of a trace, but in those cases the meaning comes from the context, rather than the syntax.

That raises the question: is this very narrow meaning useful? I see two sides to it:

It's not useful for programming languages. let bindings are indeed better—so much so, that I expect programming languages won't want any handle literal syntax. I like @lann's ideas about "plain old data", but think that they'd work better as higher-level language features than as handle-literal syntax. Programming languages will want their own parsers in any case, to support subexpressions, variable names, etc., so they can just omit the value language's handle literal syntax in their own parsers.

At the same time, it's desirable to retain a distinct value language, even as we design programming languages. A value language is useful for trace logs, for debuggers, and for RPC protocols (eg. when it's desirable to have a text analog of a binary protocol), that are just recording values. These use cases don't want to to be evaluating expressions, maintaining scope trees of let bindings, or anything like that, so they want a simple value language, including a handle syntax so that they can record any value.

@alexcrichton
Copy link
Member

There's two pieces here that I'd personally like to sort out at least for myself, and I'm not sure they've been previously brought up (sorry if they were, just point me to somewhere else):

  1. As a reflection of a preexisting value something like #body-2 makes sense to me, but WAVE is also used to create values. I don't know what --invoke foo(#body-2) would mean on the wasmtime CLI, for example. Given the dual "read/write" nature of WAVE, how would using raw index handles work?
  2. The Wasmtime embedding API doesn't give embedders access to C-list indices at all, so although a component may pass the integer 23 to a function call for a resource handle there's no means currently by which the host can print body-23. Basically the index-in-the-component is lost pretty quickly and there's no way to recover it currently.

For (1) I suppose it could be said that such an invocation would "simply fail" in that the C-list form is a serialize-the-value-to-text feature and not a deserialize-a-value-from-text feature. For (2) though my hunch is that it's the current trajectory for eventual integration so it might be good to ensure there's a plan for that before committing to the syntax here?

@cfallin
Copy link
Member

cfallin commented Nov 1, 2025

One additional wrinkle here that I haven't seen answered above is: how exactly can replay match up resource IDs embedded within returned composite values?

To be concrete, looking at this snippet

fastly:compute/http-body.new()
ret: ok(#1)

there seems to be an implied semantics for the "replay execution": whatever return value comes back from that invocation, take the returned resource and put it at index 1 in the C-list.

But say we have a function that returns a struct that contains multiple resources:

my-iface/bag-of-handles.new()
ret: {handle-a: #1, handle-b: #2, handle-c: #3}

It seems we need to extend the semantics somehow to include destructuring. I think this is part of what folks are getting at above with "this is a programming language": we don't necessarily need anything fancy like control flow, but even a straight-line trace needs something like assignment and destructuring to make the flow of handles from one API call to the next explicit and not depend on some implicit underlying implementation-specific order in which they're added to the C-list, I think.

@cfallin
Copy link
Member

cfallin commented Nov 1, 2025

To add another explicit distinction: I think the aspect of let-bindings that matters less (IMHO) is giving readable names to handles; that's useful if humans author this format but not essential, and with my compiler-brain inserted, these are "just virtual register numbers" or something like that. The part that matters more is writing down the explicit semantics of how returned handles are wired up to further invocations, including in the presence of deep data structures.

@chenyan2002
Copy link
Contributor Author

chenyan2002 commented Nov 4, 2025

Thanks for all the comments. I think the confusion comes from the difference between handle and resource. What we really need is a handle value, which can only reference an existing resource, instead of a resource value, which suggests creating new resources.

how exactly can replay match up resource IDs embedded within returned composite values?

For replay, the handle value/label is only for documentation. There is no operational semantics. When we see a fresh handle label, the replay component simply creates a resource with the expected type (by looking at the WIT signature). And we assert that the created handle matches the recorded handle. If the trace was recorded from a valid execution, we know that the created handle will always match, assuming that the Wasm execution is deterministic. Note that in the replay, the resource state is lost. But since we recorded all the interactions with that resource, we don’t need the full state to get/mock the correct response.

In this sense, the handle label is “read-only”. We can only document what actually happened in an execution. We cannot “write” a handle label that doesn’t exist before, which would invalidate the assertion.

evaluation order for composite values

The handle label is an opaque label which ties to the identity of the resource, and the label content doesn’t need to correspond to the underlying handle id. So while there is an evaluation order decided by the WAVE parser, the handle label stays the same, regardless of when the handle value is parsed or evaluated. For replay, the handle label is used as an assertion, rather than an “opcode” to decide the evaluation order.

WAVE is also used to create values. I don't know what --invoke foo(#body-2) would mean on the wasmtime CLI, for example. Given the dual "read/write" nature of WAVE, how would using raw index handles work?

So for wasmtime –invoke, we cannot write #body-2, unless this handle is already provided by wasmtime, and the user knows about this handle label. We cannot create new resources that didn't exist before. To make it more clear, we should call this a “handle value”, instead of a “resource value”.

The Wasmtime embedding API doesn't give embedders access to C-list indices at all, so although a component may pass the integer 23 to a function call for a resource handle there's no means currently by which the host can print body-23.

Correct. We cannot create a resource with a specified handle id, and for security purposes, it’s best to keep it that way. The WAVE value we proposed here is only for referencing an existing resource instance. If we are going to design a programming language to create resources, it’s best to do this on top of WAVE, and WAVE can stay as a pure value language for WIT.

@chenyan2002 chenyan2002 changed the title Add resource value to Wave syntax Add handle value to Wave syntax Nov 4, 2025
@alexcrichton
Copy link
Member

Personally I'm having a hard time envisioning anything here. I don't quite feel that the handle/resource distinction is accurate myself, but I'm realizing that I don't really have any idea at all how this is going to integrate elsewhere. Would you be up for prototyping what integration in Wasmtime might look like, for example? That might help make some of the discussion around this more concrete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants