Skip to content

JumperBot/whitespace-sifter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whitespace-sifter

crates.io version github.com forks github.com stars crates.io downloads


use whitespace_sifter::WhitespaceSifter;
// This prints `1.. 2.. 3.. 4.. 5..`.
println!(
    "{}",
    "1.. \n2..  \n\r\n\n3..   \n\n\n4..    \n\n\r\n\n\n5..     \n\n\n\n\n".sift(),
);

// This prints `1..\n2..\n3..\n4..\r\n5..`.
println!(
    "{}",
    "1.. \n2..  \n\r\n3..   \n\n\n4..    \r\n\n\r\n\n5..     \n\n\n\n\n"
        .sift_preserve_newlines(),
);

✨ Sift Duplicate Whitespaces In One Function Call

This crate helps you remove duplicate whitespaces within a UTF-8 encoded string.
It naturally removes the whitespaces at the start and end of the string.


📈 Crate Comparison

Crate Implementation
whitespace-sifter Any AsRef<str> as input, CR-LF compatibility, preserve_newlines
collapse &str input only
fast_whitespace_collapse &str input only, SIMD with fallback for any unsupported rustc target

Crate Whitespace Dictionary Time Complete
whitespace-sifter '\t' | '\n' | '\x0C' | '\r' | ' '| "\r\n" ~170 µs
collapse ' ' | '\x09'..='\x0d' | unicode::White_Space(c) ~270 µs
fast_whitespace_collapse ' ' | '\t' ~160 µs

Disclaimers:

  1. I do not know the crate maintainers nor asked for permission to include their crates here.

  2. As far as I know, there are only three crates dedicated to whitespace sifting/collapse.

  3. fast_whitespace_collapse was not able to collapse cr-lf and line feeds.


⚡️Benchmarks

Performance is a priority; Most updates are performance improvements.
The benchmark uses a transcript of the Bee Movie.

Execute these commands to benchmark:

$ git clone https://github.com/JumperBot/whitespace-sifter.git
$ cd whitespace-sifter/bench
$ cargo bench

You should only look for results that look like the following:

Sift/Sift               time:   [178.69 µs 178.84 µs 179.03 µs]
Sift Preserved/Sift Preserved
                        time:   [179.61 µs 179.75 µs 179.90 µs]

In just 0.0001 seconds; Pretty impressive, no?

Go try it on a better machine, I guess. Benchmark specifications:
  • Processor: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz 1.90 GHz
  • Memory: RAM 16.0 GB (15.8 GB usable)
  • System: GNU/Linux 5.15.153.1-microsoft-standard-WSL2 x86_64
  • Modified: v2.3.4

🔊 Changelog

  • Improved Performance
  • Minimum Supported Rust Version set to v1.79.0 (starting v2.3.3)
  • Stricter Tests (starting v2.3.2)
    • Proper UTF-8/Unicode Encoding
    • Regular Sifting
    • Sifting With Leading Whitespaces
    • Documentation Assertion
    • MSRV Verification
  • Crate Comparison (starting v2.3.4)
  • Benchmark Separation (starting v2.3.5)

📄 Licensing

whitespace-sifter is licensed under the MIT LICENSE; This is the summarization.