Description
Proposal
Problem statement
Because of how BufRead::fill_buf
works, chunked decoders such as base64 or hex that decode from generic readers are currently forced to maintain an internal buffer of unread byte chunks. This makes the code much more complex, error prone and slower by introducing additional copies along with all the branches required to handle the edge cases. BufReader::peek
solves this but only for the specific case when the reader is BufReader
, not for generic readers (e.g. slices).
Motivating examples or use cases
The base64
crate has this code:
pub struct DecoderReader<'e, E: Engine, R: io::Read> {
// Unrelated fields omitted for brevity
/// Where b64 data is read from
inner: R,
/// Holds b64 data read from the delegate reader.
b64_buffer: [u8; BUF_SIZE],
/// The start of the pending buffered data in `b64_buffer`.
b64_offset: usize,
/// The amount of buffered b64 data after `b64_offset` in `b64_len`.
b64_len: usize,
}
The three additional fields hold an internal buffer that gets copied from the internal reader. This means that reading from e.g. a slice adds an additional copy and complex handling of edge cases. (You can see quite complicated code if you read that file.) The BufRead
trait almost solves it but one edge case still remains: if the number of bytes in the reader is less than the chunk size the only way to progress is to copy the bytes into an intermediate buffer before continuing with decoding. As mentioned, BufReader::peek
can solve this but storing BufReader
internally would add additional copying if an already-buffered reader is used.
Solution sketch
We can solve these problems by adding this trait:
// note that I'm changing the name from peek because the semantics is a bit different - it should return the entire internal buffer rather than just a sub-slice
pub trait RequireBytes: BufRead {
/// Returns the maximum number of bytes that can be requested, None meaning unlimited.
///
/// Note that this doesn't mean that `require_bytes` could return an unlimited number of bytes but merely that it can return any number of bytes until it reaches EOF.
fn capacity(&self) -> Option<usize>;
/// Attempts to fill the internal buffer until at least `num_bytes` are in it.
///
/// This will return a shorter buffer if EOF is reached. In case of error the data is preserved inside the buffer.
/// This will return a longer buffer if the underlying reader happened to have more bytes available.
fn require_bytes(&mut self, num_bytes: usize) -> Result<&[u8]>;
}
Libraries that implement decoding like base64 can then just bound on RequireBytes
and avoid all the complexity.
Alternatives
Strictly speaking, this doesn't require std
. Any library wishing to do this can define its own trait and implement it for all std
types (once BufReader::peek
is stable). However it would be annoying to coordinate implementations across various crates. Having a "common vocabulary" trait in std
would make the implementation more easily accessible.
Links and related work
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.