Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow other specifications to create readable byte streams #1121

Closed
domenic opened this issue Apr 6, 2021 · 7 comments · Fixed by #1130
Closed

Allow other specifications to create readable byte streams #1121

domenic opened this issue Apr 6, 2021 · 7 comments · Fixed by #1130
Labels
byte streams other spec interface How the Streams spec interfaces with other specifications

Comments

@domenic
Copy link
Member

domenic commented Apr 6, 2021

I'd like to solve WICG/serial#127 and whatwg/fetch#267. /cc @reillyeon @yutakahirano. This issue is trying to figure out how to do that.

Note that Web Serial has a proper pull algorithm and so is very similar to https://streams.spec.whatwg.org/#example-rbs-pull . Fetch instead is more like https://streams.spec.whatwg.org/#example-rbs-push, except it has backpressure support; it can send a suspend and resume signal.


We'd have an algorithm "Set up a readable byte stream" similar to https://streams.spec.whatwg.org/#readablestream-set-up. It takes a mandatory autoAllocateChunkSize. (Although maybe consumer specs would just set that to an implementation-defined value anyway, so perhaps we should have streams do that as a default instead of making it mandatory?).

Then we have two cases:

  • Specs like Web Serial which only want to enqueue from the pull algorithm. They will always get a BYOB request they can fill exactly.

  • Specs like Fetch which want to enqueue whenever the network has data. They will only sometimes have a BYOB request available.

I think the basic primitives they would use are:

  • "enqueue bytes": takes an Infra byte sequence and wraps it into a Uint8Array and enqueues it, ignoring any BYOB requests. Specs "should" avoid doing this whenever there's a current BYOB request.

  • "current BYOB request": returns... just the number of bytes requested? I think the actual identity of the ArrayBuffer, as well as the offset, are not interesting. So maybe this is "current bytes requested". It needs to also be able to return null if there is no outstanding BYOB request, in which case the spec should use "enqueue". It will never be null inside a pullAlgorithm.

  • "respond to current BYOB request": takes an Infra byte sequence (of max length = current bytes requested) and puts it into the appropriate offset in the BYOB request's view. Then calls the spec-equivalent of byobRequest.respond(byteSequenceLength).

    The way this is written makes it seem like there are necessarily copies (from the byte sequence into the ArrayBufferView), so we'd have to note that more efficient implementations are possible, where the OS directly writes into the ArrayBufferView's backing memory, and the byte sequences are just for spec convenience.

Also, we'd need to update https://streams.spec.whatwg.org/#readablestream-need-more-data and its following definitions to work on byte controllers.


In practice I think this would look like the following:

  • Web Serial:
    • The pullAlgorithm consults the "current bytes requested" instead of the desired size.
    • The pullAlgorithm uses "respond to BYOB request" with the bytes it gets, instead of wrapping into a Uint8Array and enqueueing.
  • Fetch:
    • Says something vague about attempting to use the "current bytes requested" to ask the network for the appropriate amount??
    • If "current bytes requested" is non-null, then:
      • If the (post-content-coding) bytes fit into current bytes requested, then "respond to BYOB request" with them.
      • Otherwise... just enqueue them? Or split up the bytes so that you respond to the BYOB request with the first N and then enqueue the rest?
    • If "current bytes requested" is null, then:
      • Enqueue the bytes
      • Ask the user agent to suspend the ongoing fetch

Any thoughts? How does this sound to spec writers who might use it?

@domenic domenic added byte streams other spec interface How the Streams spec interfaces with other specifications labels Apr 6, 2021
@MattiasBuelens
Copy link
Collaborator

It takes a mandatory autoAllocateChunkSize. (Although maybe consumer specs would just set that to an implementation-defined value anyway, so perhaps we should have streams do that as a default instead of making it mandatory?).

I think making the implementation-defined value part of the streams specification would give the idea that this value must be the same for all streams. I assume that different specifications can have different "expected" chunk sizes (e.g. Fetch in Chromium will likely keep chunking at 64 KiB, but Web Serial might want smaller chunks by default), in which case a mandatory argument would make more sense. But I'm not a consumer spec author, so I could be wrong about this. 😅

Or split up the bytes so that you respond to the BYOB request with the first N and then enqueue the rest?

Note that ReadableByteStreamControllerEnqueue already does this. 😉

In the limit, we could use "enqueue bytes" everywhere and remove "respond to BYOB request" entirely, instead leaving it up to implementations if they want to optimize by responding to the BYOB request where possible. But I don't know if implementors will feel confident/motivated to figure out these possible optimizations on their own, so maybe the specification should require and define at least some of them? 🤷‍♂️

@reillyeon
Copy link

It takes a mandatory autoAllocateChunkSize.

My only concern with making this mandatory is the potential to introduce developer-visible side effects that prevent optimizations. At an abstract level autoAllocationChunkSize should be set to options['bufferSize'] the same way highWaterMark is already. The pull algorithm would then call the OS directly with the automatically allocated buffer. This implies that the difference between the byteLength of the Uint8Array view and the underlying ArrayBuffer is the amount of space not used by the incoming data and that this might be visible to developers.

Due to the way the pull algorithm is implemented in Chromium there is a mandatory memory copy when receiving bytes and so the ArrayBuffer is currently always exactly the same length as the Uint8Array. Based on my understanding of how fetch() is implemented the same is true there. Having the Streams implementation automatically allocate these buffers instead would mean always allocating larger buffers than necessary with potentially developer-visible side effects.

@domenic
Copy link
Member Author

domenic commented Apr 7, 2021

I think making the implementation-defined value part of the streams specification would give the idea that this value must be the same for all streams.

Good point. I agree that at the very least this would be something we'd have to wordsmith around.

Note that ReadableByteStreamControllerEnqueue already does this. 😉

In the limit, we could use "enqueue bytes" everywhere and remove "respond to BYOB request" entirely, instead leaving it up to implementations if they want to optimize by responding to the BYOB request where possible.

Heh, thanks for the reminder... Indeed, I'm not sure what the best route is for making it clear that writing directly into the buffer is preferred, while also allowing spec authors some freedom to use the prose they're used to (e.g. "getting byte sequences from the OS" or "getting byte sequences from the network").

My only concern with making this mandatory is the potential to introduce developer-visible side effects that prevent optimizations.

Thanks very much for outlining the concern and details here. I agree that if we used autoAllocateChunkSize in the fashion I described above the ArrayBuffer would be mandated to be the same size every time and could not be trimmed to fit the data more exactly.

At a high level:

  • If your consumer is using a BYOB reader, then you really should give them back the full ArrayBuffer, not trimmed.

  • But if your consumer is using a default reader, it'd be nice to allow trimming.

To accomplish this today I guess you would not pass autoAllocateChunkSize, and then you'd branch:

  • If there's a BYOB request, try to fill the BYOB request's buffer, and even if it's not filled, give them back the full buffer. (In this scenario, does options["bufferSize"] just get ignored?)

  • If there's no BYOB request, then use options["bufferSize"] to request bytes from the OS, and enqueue the resulting buffer (which might be smaller than options["bufferSize"]).

This seems reasonable, albeit more ceremony than I was envisioning. What do you think?

If it seems OK then maybe we should make this the model everyone uses, and avoid using autoAllocateChunkSize.

@reillyeon
Copy link

To accomplish this today I guess you would not pass autoAllocateChunkSize, and then you'd branch:

  • If there's a BYOB request, try to fill the BYOB request's buffer, and even if it's not filled, give them back the full buffer. (In this scenario, does options["bufferSize"] just get ignored?)

That's an interesting question. As currently implemented in Chromium there is a shared memory buffer between the browser process and renderer process and the size of the buffer is options["bufferSize"]. Reading from the serial port occurs in the browser process so no matter how large the BYOB request's buffer is there will never be more than options["bufferSize"] bytes available to copy into it. The fetch() API uses a similar architecture, but with the fixed 64k buffer @MattiasBuelens mentioned.

BYOB definitely introduces more opportunities for implementation details like this to be visible to developers.

  • If there's no BYOB request, then use options["bufferSize"] to request bytes from the OS, and enqueue the resulting buffer (which might be smaller than options["bufferSize"]).

The Web Serial API pull steps currently say to read up to options["bufferSize"] bytes and then put them into a new ArrayBuffer whose length is the number of bytes read. I think it would be reasonable to add a note that the constructed ArrayBuffer can be up to options["bufferSize"] bytes long (or any length) and it is up to the implementation to decide which is more efficient to implement.

@domenic
Copy link
Member Author

domenic commented Apr 8, 2021

That's an interesting question. As currently implemented in Chromium there is a shared memory buffer between the browser process and renderer process and the size of the buffer is options["bufferSize"]. Reading from the serial port occurs in the browser process so no matter how large the BYOB request's buffer is there will never be more than options["bufferSize"] bytes available to copy into it. The fetch() API uses a similar architecture, but with the fixed 64k buffer @MattiasBuelens mentioned.

This seems reasonable to me. The exposure of implementation internals is somewhat unfortunate, but seems unavoidable. In general, some implementation strategies will be able to use the full range of the BYOB request buffer, and some won't.

Web Serial actually seems to be in a better situation than Fetch, since it has explicit developer control via bufferSize, so it'd be more likely to be interoperable. I.e., its spec could explicitly say that at most bufferSize bytes will be filled in the BYOB request, even if you pass a larger one.

I might try to write up a tentative spec patch soon for this (maybe even a Web Serial patch too) and see how it goes.

@taralx
Copy link

taralx commented May 30, 2021

Will throw in my 2c that compression streams should also be byte streams.

@ricea
Copy link
Collaborator

ricea commented May 31, 2021

Will throw in my 2c that compression streams should also be byte streams.

Yes, but I think we should specify byte transform streams here first, otherwise the compression streams spec would have to do a lot of work to specify the machinery of byte transforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
byte streams other spec interface How the Streams spec interfaces with other specifications
Development

Successfully merging a pull request may close this issue.

5 participants