use whitespace_sifter::WhitespaceSifter;
// This prints `1.. 2.. 3.. 4.. 5..`.
println!(
"{}",
"1.. \n2.. \n\r\n\n3.. \n\n\n4.. \n\n\r\n\n\n5.. \n\n\n\n\n".sift(),
);
// This prints `1..\n2..\n3..\n4..\r\n5..`.
println!(
"{}",
"1.. \n2.. \n\r\n3.. \n\n\n4.. \r\n\n\r\n\n5.. \n\n\n\n\n"
.sift_preserve_newlines(),
);
This crate helps you remove duplicate whitespaces within a UTF-8 encoded string
.
It naturally removes the whitespaces at the start and end of the string
.
Crate | Implementation |
---|---|
whitespace-sifter | Any AsRef<str> as input, CR-LF compatibility, preserve_newlines |
collapse | &str input only |
fast_whitespace_collapse | &str input only, SIMD with fallback for any unsupported rustc target |
Crate | Whitespace Dictionary | Time | Complete |
---|---|---|---|
whitespace-sifter | '\t' | '\n' | '\x0C' | '\r' | ' '| "\r\n" |
~170 µs | ✅ |
collapse | ' ' | '\x09'..='\x0d' | unicode::White_Space(c) |
~270 µs | ✅ |
fast_whitespace_collapse | ' ' | '\t' |
~160 µs | ❌ |
-
I do not know the crate maintainers nor asked for permission to include their crates here.
-
As far as I know, there are only three crates dedicated to whitespace sifting/collapse.
-
fast_whitespace_collapse
was not able to collapse cr-lf and line feeds.
Performance is a priority; Most updates are performance improvements.
The benchmark uses a transcript of the Bee Movie.
Execute these commands to benchmark:
$ git clone https://github.com/JumperBot/whitespace-sifter.git
$ cd whitespace-sifter/bench
$ cargo bench
You should only look for results that look like the following:
Sift/Sift time: [178.69 µs 178.84 µs 179.03 µs]
Sift Preserved/Sift Preserved
time: [179.61 µs 179.75 µs 179.90 µs]
In just 0.0001 seconds; Pretty impressive, no?
Go try it on a better machine, I guess.
Benchmark specifications:- Processor: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz 1.90 GHz
- Memory: RAM 16.0 GB (15.8 GB usable)
- System: GNU/Linux 5.15.153.1-microsoft-standard-WSL2 x86_64
- Modified: v2.3.4
- Improved Performance
- Minimum Supported Rust Version set to
v1.79.0
(startingv2.3.3
) - Stricter Tests (starting
v2.3.2
)- Proper UTF-8/Unicode Encoding
- Regular Sifting
- Sifting With Leading Whitespaces
- Documentation Assertion
- MSRV Verification
- Crate Comparison (starting
v2.3.4
) - Benchmark Separation (starting
v2.3.5
)
whitespace-sifter
is licensed under the MIT LICENSE
; This is the summarization
.