Skip to content

ACP: str::chunks with chunks being &str #592

Closed as not planned
Closed as not planned
@tkr-sh

Description

@tkr-sh

Proposal

Problem statement

The std currently provides various methods for chunking slices (array_chunks) and iterators (chunks, chunks_exact, rchunks, array_chunks, utf8_chunks, ...).
However, there is no equivalent method for string slices. And currently, the "developer experience" related to chunks in &str can be improved.

Motivating examples or use cases

Chunking is an action that may often be needed when working with data that can be seen as an iterator.
This is why there are methods for this with slices and iterators.
But, there are none for &str even tho it can be useful a lot of time!
Here are some examples:

  • Converting binary or hexadecimal strings into an iterator of an integer.
    Currently we would do
let hex = "0xABCDEF";
let values = hex[2..]
    .bytes()
    .array_chunks::<2>()  // unstable
    .map(|arr| u8::from_str_radix(str::from_utf8(&arr).unwrap(), 16))  // .unwrap()

// Instead of possibly doing

let values = hex[2..]
    .chunks(2)
    .map(|str| u8::from_str_radix(str, 16))
  • Processsing some padded data like hello---only----8-------chars---
  • Wrapping some text safely
let user_text = "...";
user_text.chunks(width).intersperse("\n").collect::<String>()

Overall, everything that is about handling data with repetitive pattern or with some wrapping or formatting would benefit from this function.

Another problem is that, array_chunks doesn't have the same behaviour as slice::chunk since the last element is discarded if it doesn't do the same size as chunk_size which isn't always wanted.
But, if you want to achieve the same thing in the current context, you will have create an unecessary vector:

let vec = "hello world".chars().collect::<Vec<_>>(); // Really inneficient
vec.as_slice().chunks(4) // ["hell", "o wo", "rld"]
// instead of just
"hello world".chunks(4) // ["hell", "o wo", "rld"]

It's

  1. more code
  2. less readable
  3. owning some unecessary data
  4. losing the borrowing lifetime of the initial string slice
fn example_when_owning(s: &str) -> Vec<&str> {
    let vec = "hello world".bytes().collect::<Vec<_>>();
    vec.as_slice()
        .chunks(4)
        .map(|bytes| str::from_utf8(bytes).unwrap())
        .collect() // Error! The function tries to return some borrowed data (str::from_utf8) declared in this function
}

fn example_when_borrowing(s: &str) -> Vec<&str> {
    "hello world".chunks(4).collect() // works fine!
}

Also, str::chunks() is faster than Chars::array_chunks() (without even considering str::from_utf8().unwrap())

Solution sketch

  • Create a new str::Chunks in core/src/str/iter.rs and implement Iterator & DoubleEndedIterator on it
  • Create a new method on str:
pub fn chunks(&self, chunk_size: usize) -> str::Chunks<'_> {
    str::Chunks::new(self, chunk_size)
}

Implementation at https://github.com/tkr-sh/rust/tree/str-chunks

Drawbacks

.chunks() on &str isn't necessary clear if it's on u8 or char. Tho, if chunks are &str it makes sens that it's on chars.

Alternatives

  • .chars().collect() then vec.as_slice().chunks() but it's significantly longer and is owning data that could be avoided. See motivation.
  • .chars().array_chunks() but it's unstable, slower and doesn't behave in the same way. See motivation.

Links and related work


From rust-lang/rfcs#3818

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions