Allow other specifications to create readable byte streams #1121

domenic · 2021-04-06T20:18:02Z

I'd like to solve WICG/serial#127 and whatwg/fetch#267. /cc @reillyeon @yutakahirano. This issue is trying to figure out how to do that.

Note that Web Serial has a proper pull algorithm and so is very similar to https://streams.spec.whatwg.org/#example-rbs-pull . Fetch instead is more like https://streams.spec.whatwg.org/#example-rbs-push, except it has backpressure support; it can send a suspend and resume signal.

We'd have an algorithm "Set up a readable byte stream" similar to https://streams.spec.whatwg.org/#readablestream-set-up. It takes a mandatory autoAllocateChunkSize. (Although maybe consumer specs would just set that to an implementation-defined value anyway, so perhaps we should have streams do that as a default instead of making it mandatory?).

Then we have two cases:

Specs like Web Serial which only want to enqueue from the pull algorithm. They will always get a BYOB request they can fill exactly.
Specs like Fetch which want to enqueue whenever the network has data. They will only sometimes have a BYOB request available.

I think the basic primitives they would use are:

"enqueue bytes": takes an Infra byte sequence and wraps it into a Uint8Array and enqueues it, ignoring any BYOB requests. Specs "should" avoid doing this whenever there's a current BYOB request.
"current BYOB request": returns... just the number of bytes requested? I think the actual identity of the ArrayBuffer, as well as the offset, are not interesting. So maybe this is "current bytes requested". It needs to also be able to return null if there is no outstanding BYOB request, in which case the spec should use "enqueue". It will never be null inside a pullAlgorithm.
"respond to current BYOB request": takes an Infra byte sequence (of max length = current bytes requested) and puts it into the appropriate offset in the BYOB request's view. Then calls the spec-equivalent of byobRequest.respond(byteSequenceLength).

The way this is written makes it seem like there are necessarily copies (from the byte sequence into the ArrayBufferView), so we'd have to note that more efficient implementations are possible, where the OS directly writes into the ArrayBufferView's backing memory, and the byte sequences are just for spec convenience.

Also, we'd need to update https://streams.spec.whatwg.org/#readablestream-need-more-data and its following definitions to work on byte controllers.

In practice I think this would look like the following:

Web Serial:
- The pullAlgorithm consults the "current bytes requested" instead of the desired size.
- The pullAlgorithm uses "respond to BYOB request" with the bytes it gets, instead of wrapping into a Uint8Array and enqueueing.
Fetch:
- Says something vague about attempting to use the "current bytes requested" to ask the network for the appropriate amount??
- If "current bytes requested" is non-null, then:
  - If the (post-content-coding) bytes fit into current bytes requested, then "respond to BYOB request" with them.
  - Otherwise... just enqueue them? Or split up the bytes so that you respond to the BYOB request with the first N and then enqueue the rest?
- If "current bytes requested" is null, then:
  - Enqueue the bytes
  - Ask the user agent to suspend the ongoing fetch

Any thoughts? How does this sound to spec writers who might use it?

The text was updated successfully, but these errors were encountered:

MattiasBuelens · 2021-04-06T21:38:08Z

It takes a mandatory autoAllocateChunkSize. (Although maybe consumer specs would just set that to an implementation-defined value anyway, so perhaps we should have streams do that as a default instead of making it mandatory?).

I think making the implementation-defined value part of the streams specification would give the idea that this value must be the same for all streams. I assume that different specifications can have different "expected" chunk sizes (e.g. Fetch in Chromium will likely keep chunking at 64 KiB, but Web Serial might want smaller chunks by default), in which case a mandatory argument would make more sense. But I'm not a consumer spec author, so I could be wrong about this. 😅

Or split up the bytes so that you respond to the BYOB request with the first N and then enqueue the rest?

Note that ReadableByteStreamControllerEnqueue already does this. 😉

In the limit, we could use "enqueue bytes" everywhere and remove "respond to BYOB request" entirely, instead leaving it up to implementations if they want to optimize by responding to the BYOB request where possible. But I don't know if implementors will feel confident/motivated to figure out these possible optimizations on their own, so maybe the specification should require and define at least some of them? 🤷‍♂️

reillyeon · 2021-04-06T22:22:08Z

It takes a mandatory autoAllocateChunkSize.

My only concern with making this mandatory is the potential to introduce developer-visible side effects that prevent optimizations. At an abstract level autoAllocationChunkSize should be set to options['bufferSize'] the same way highWaterMark is already. The pull algorithm would then call the OS directly with the automatically allocated buffer. This implies that the difference between the byteLength of the Uint8Array view and the underlying ArrayBuffer is the amount of space not used by the incoming data and that this might be visible to developers.

Due to the way the pull algorithm is implemented in Chromium there is a mandatory memory copy when receiving bytes and so the ArrayBuffer is currently always exactly the same length as the Uint8Array. Based on my understanding of how fetch() is implemented the same is true there. Having the Streams implementation automatically allocate these buffers instead would mean always allocating larger buffers than necessary with potentially developer-visible side effects.

domenic · 2021-04-07T16:27:18Z

I think making the implementation-defined value part of the streams specification would give the idea that this value must be the same for all streams.

Good point. I agree that at the very least this would be something we'd have to wordsmith around.

Note that ReadableByteStreamControllerEnqueue already does this. 😉

In the limit, we could use "enqueue bytes" everywhere and remove "respond to BYOB request" entirely, instead leaving it up to implementations if they want to optimize by responding to the BYOB request where possible.

Heh, thanks for the reminder... Indeed, I'm not sure what the best route is for making it clear that writing directly into the buffer is preferred, while also allowing spec authors some freedom to use the prose they're used to (e.g. "getting byte sequences from the OS" or "getting byte sequences from the network").

My only concern with making this mandatory is the potential to introduce developer-visible side effects that prevent optimizations.

Thanks very much for outlining the concern and details here. I agree that if we used autoAllocateChunkSize in the fashion I described above the ArrayBuffer would be mandated to be the same size every time and could not be trimmed to fit the data more exactly.

At a high level:

If your consumer is using a BYOB reader, then you really should give them back the full ArrayBuffer, not trimmed.
But if your consumer is using a default reader, it'd be nice to allow trimming.

To accomplish this today I guess you would not pass autoAllocateChunkSize, and then you'd branch:

If there's a BYOB request, try to fill the BYOB request's buffer, and even if it's not filled, give them back the full buffer. (In this scenario, does options["bufferSize"] just get ignored?)
If there's no BYOB request, then use options["bufferSize"] to request bytes from the OS, and enqueue the resulting buffer (which might be smaller than options["bufferSize"]).

This seems reasonable, albeit more ceremony than I was envisioning. What do you think?

If it seems OK then maybe we should make this the model everyone uses, and avoid using autoAllocateChunkSize.

reillyeon · 2021-04-08T00:54:54Z

To accomplish this today I guess you would not pass autoAllocateChunkSize, and then you'd branch:

If there's a BYOB request, try to fill the BYOB request's buffer, and even if it's not filled, give them back the full buffer. (In this scenario, does options["bufferSize"] just get ignored?)

That's an interesting question. As currently implemented in Chromium there is a shared memory buffer between the browser process and renderer process and the size of the buffer is options["bufferSize"]. Reading from the serial port occurs in the browser process so no matter how large the BYOB request's buffer is there will never be more than options["bufferSize"] bytes available to copy into it. The fetch() API uses a similar architecture, but with the fixed 64k buffer @MattiasBuelens mentioned.

BYOB definitely introduces more opportunities for implementation details like this to be visible to developers.

If there's no BYOB request, then use options["bufferSize"] to request bytes from the OS, and enqueue the resulting buffer (which might be smaller than options["bufferSize"]).

The Web Serial API pull steps currently say to read up to options["bufferSize"] bytes and then put them into a new ArrayBuffer whose length is the number of bytes read. I think it would be reasonable to add a note that the constructed ArrayBuffer can be up to options["bufferSize"] bytes long (or any length) and it is up to the implementation to decide which is more efficient to implement.

domenic · 2021-04-08T22:00:27Z

That's an interesting question. As currently implemented in Chromium there is a shared memory buffer between the browser process and renderer process and the size of the buffer is options["bufferSize"]. Reading from the serial port occurs in the browser process so no matter how large the BYOB request's buffer is there will never be more than options["bufferSize"] bytes available to copy into it. The fetch() API uses a similar architecture, but with the fixed 64k buffer @MattiasBuelens mentioned.

This seems reasonable to me. The exposure of implementation internals is somewhat unfortunate, but seems unavoidable. In general, some implementation strategies will be able to use the full range of the BYOB request buffer, and some won't.

Web Serial actually seems to be in a better situation than Fetch, since it has explicit developer control via bufferSize, so it'd be more likely to be interoperable. I.e., its spec could explicitly say that at most bufferSize bytes will be filled in the BYOB request, even if you pass a larger one.

I might try to write up a tentative spec patch soon for this (maybe even a Web Serial patch too) and see how it goes.

taralx · 2021-05-30T17:25:37Z

Will throw in my 2c that compression streams should also be byte streams.

ricea · 2021-05-31T13:08:42Z

Will throw in my 2c that compression streams should also be byte streams.

Yes, but I think we should specify byte transform streams here first, otherwise the compression streams spec would have to do a lot of work to specify the machinery of byte transforms.

Closes #1121.

domenic added byte streams other spec interface How the Streams spec interfaces with other specifications labels Apr 6, 2021

domenic mentioned this issue Apr 19, 2021

Use a ReadableStream with byte source (formerly called ReadableByteStream) for .body whatwg/fetch#267

Closed

domenic mentioned this issue Jun 2, 2021

Add and improve operations on BufferSources whatwg/webidl#987

Merged

domenic added a commit that referenced this issue Jun 2, 2021

Allow other specifications to create readable byte streams

4dd5475

Closes #1121.

This was referenced Jun 2, 2021

Allow other specifications to create readable byte streams #1130

Merged

Use readable byte streams WICG/serial#137

Merged

domenic added a commit that referenced this issue Jun 25, 2021

Allow other specifications to create readable byte streams

7ab47e9

Closes #1121.

domenic mentioned this issue Jun 29, 2021

Set up ReceiveStream with proper algorithms w3c/webtransport#297

Merged

domenic added a commit that referenced this issue Jul 8, 2021

Allow other specifications to create readable byte streams

195d45a

Closes #1121.

domenic closed this as completed in #1130 Jul 15, 2021

domenic added a commit that referenced this issue Jul 15, 2021

Allow other specifications to create readable byte streams

bb53bec

Closes #1121.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow other specifications to create readable byte streams #1121

Allow other specifications to create readable byte streams #1121

domenic commented Apr 6, 2021

MattiasBuelens commented Apr 6, 2021

reillyeon commented Apr 6, 2021

domenic commented Apr 7, 2021

reillyeon commented Apr 8, 2021

domenic commented Apr 8, 2021

taralx commented May 30, 2021

ricea commented May 31, 2021

Allow other specifications to create readable byte streams #1121

Allow other specifications to create readable byte streams #1121

Comments

domenic commented Apr 6, 2021

MattiasBuelens commented Apr 6, 2021

reillyeon commented Apr 6, 2021

domenic commented Apr 7, 2021

reillyeon commented Apr 8, 2021

domenic commented Apr 8, 2021

taralx commented May 30, 2021

ricea commented May 31, 2021