Description
Proposal
Problem statement
The std currently provides various methods for chunking slices (array_chunks
) and iterators (chunks
, chunks_exact
, rchunks
, array_chunks
, utf8_chunks
, ...).
However, there is no equivalent method for string slices. And currently, the "developer experience" related to chunks in &str
can be improved.
Motivating examples or use cases
Chunking is an action that may often be needed when working with data that can be seen as an iterator.
This is why there are methods for this with slices and iterators.
But, there are none for &str
even tho it can be useful a lot of time!
Here are some examples:
- Converting binary or hexadecimal strings into an iterator of an integer.
Currently we would do
let hex = "0xABCDEF";
let values = hex[2..]
.bytes()
.array_chunks::<2>() // unstable
.map(|arr| u8::from_str_radix(str::from_utf8(&arr).unwrap(), 16)) // .unwrap()
// Instead of possibly doing
let values = hex[2..]
.chunks(2)
.map(|str| u8::from_str_radix(str, 16))
- Processsing some padded data like
hello---only----8-------chars---
- Wrapping some text safely
let user_text = "...";
user_text.chunks(width).intersperse("\n").collect::<String>()
Overall, everything that is about handling data with repetitive pattern or with some wrapping or formatting would benefit from this function.
Another problem is that, array_chunks
doesn't have the same behaviour as slice::chunk
since the last element is discarded if it doesn't do the same size as chunk_size
which isn't always wanted.
But, if you want to achieve the same thing in the current context, you will have create an unecessary vector:
let vec = "hello world".chars().collect::<Vec<_>>(); // Really inneficient
vec.as_slice().chunks(4) // ["hell", "o wo", "rld"]
// instead of just
"hello world".chunks(4) // ["hell", "o wo", "rld"]
It's
- more code
- less readable
- owning some unecessary data
- losing the borrowing lifetime of the initial string slice
fn example_when_owning(s: &str) -> Vec<&str> {
let vec = "hello world".bytes().collect::<Vec<_>>();
vec.as_slice()
.chunks(4)
.map(|bytes| str::from_utf8(bytes).unwrap())
.collect() // Error! The function tries to return some borrowed data (str::from_utf8) declared in this function
}
fn example_when_borrowing(s: &str) -> Vec<&str> {
"hello world".chunks(4).collect() // works fine!
}
Also, str::chunks()
is faster than Chars::array_chunks()
(without even considering str::from_utf8().unwrap()
)
Solution sketch
- Create a new
str::Chunks
incore/src/str/iter.rs
and implementIterator
&DoubleEndedIterator
on it - Create a new method on
str
:
pub fn chunks(&self, chunk_size: usize) -> str::Chunks<'_> {
str::Chunks::new(self, chunk_size)
}
Implementation at https://github.com/tkr-sh/rust/tree/str-chunks
Drawbacks
.chunks()
on &str
isn't necessary clear if it's on u8
or char
. Tho, if chunks are &str
it makes sens that it's on char
s.
Alternatives
.chars().collect()
thenvec.as_slice().chunks()
but it's significantly longer and is owning data that could be avoided. See motivation..chars().array_chunks()
but it's unstable, slower and doesn't behave in the same way. See motivation.
Links and related work
slice::chunks(usize)
str::chars()
Iterator::array_chunks(usize)
- ACP: add
str::chunks
,str::chunks_exact
, andstr::windows
#590-
It was rejected in part because
Given the issues related to UTF-8 boundaries causing potential foot-guns [...]
which shouldn't affect this ACP.
-
From rust-lang/rfcs#3818