Skip to content

Commit 5d4b699

Browse files
authored
Merge pull request #3 from aldanor/feature/readme
Add READMEs, drop MSRV to 1.37
2 parents 31f9eb6 + 7da7d43 commit 5d4b699

File tree

13 files changed

+397
-166
lines changed

13 files changed

+397
-166
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
strategy:
1212
fail-fast: false
1313
matrix:
14-
rust: [1.47.0, stable, nightly]
14+
rust: [1.37.0, stable, nightly]
1515
steps:
1616
- uses: actions/checkout@v2
1717
with:

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ std = []
2121
lexical-core = "0.7"
2222
criterion = "0.3"
2323

24-
[dev-dependencies.hexf]
24+
[dev-dependencies.hexf-parse]
2525
version = "*"
2626
git = "https://github.com/lifthrasiir/hexf.git" # until the version on crates.io is updated
2727
rev = "0c95001574997847e1348c4f6dac5f434c772914"

README.md

Lines changed: 130 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,136 @@ fast-float
66
[![Documentation](https://docs.rs/fast-float/badge.svg)](https://docs.rs/fast-float)
77
[![Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
88
[![MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
9-
[![Rust 1.47+](https://img.shields.io/badge/rustc-1.47+-lightgray.svg)](https://blog.rust-lang.org/2020/10/08/Rust-1.47.html)
9+
[![Rustc 1.37+](https://img.shields.io/badge/rustc-1.37+-lightgray.svg)](https://blog.rust-lang.org/2019/08/15/Rust-1.37.0.html)
10+
11+
This crate provides a super-fast decimal number parser from strings into floats.
12+
13+
```toml
14+
[dependencies]
15+
fast-float = "0.1"
16+
```
17+
18+
There are no dependencies and the crate can be used in a no_std context by disabling the "std" feature.
19+
20+
*Compiler support: rustc 1.37+.*
21+
22+
## Usage
23+
24+
There's two top-level functions provided:
25+
[`parse()`](https://docs.rs/fast-float/latest/fast_float/fn.parse.html) and
26+
[`parse_partial()`](https://docs.rs/fast-float/latest/fast_float/fn.parse_partial.html), both taking
27+
either a string or a bytes slice and parsing the input into either `f32` or `f64`:
28+
29+
- `parse()` treats the whole string as a decimal number and returns an error if there are
30+
invalid characters or if the string is empty.
31+
- `parse_partial()` tries to find the longest substring at the beginning of the given input
32+
string that can be parsed as a decimal number and, in the case of success, returns the parsed
33+
value along the number of characters processed; an error is returned if the string doesn't
34+
start with a decimal number or if it is empty. This function is most useful as a building
35+
block when constructing more complex parsers, or when parsing streams of data.
36+
37+
Example:
38+
39+
```rust
40+
// Parse the entire string as a decimal number.
41+
let s = "1.23e-02";
42+
let x: f32 = fast_float::parse(s).unwrap();
43+
assert_eq!(x, 0.0123);
44+
45+
// Parse as many characters as possible as a decimal number.
46+
let s = "1.23e-02foo";
47+
let (x, n) = fast_float::parse_partial::<f32, _>(s).unwrap();
48+
assert_eq!(x, 0.0123);
49+
assert_eq!(n, 8);
50+
assert_eq!(&s[n..], "foo");
51+
```
52+
53+
## Details
54+
55+
This crate is a direct port of Daniel Lemire's [`fast_float`](https://github.com/fastfloat/fast_float)
56+
C++ library (valuable discussions with Daniel while porting it helped shape the crate and get it to
57+
the performance level it's at now), with some Rust-specific tweaks. Please see the original
58+
repository for many useful details regarding the algorithm and the implementation.
59+
60+
The parser is locale-independent. The resulting value is the closest floating-point values (using either
61+
`f32` or `f64), using the "round to even" convention for values that would otherwise fall right in-between
62+
two values. That is, we provide exact parsing according to the IEEE standard.
63+
64+
Infinity and NaN values can be parsed, along with scientific notation.
65+
66+
Both little-endian and big-endian platforms are equally supported, with extra optimizations enabled
67+
on little-endian architectures.
68+
69+
## Performance
70+
71+
The presented parser seems to beat all of the existing C/C++/Rust float parsers known to us at the
72+
moment by a large margin, in all of the datasets we tested it on so far – see detailed benchmarks
73+
below (the only exception being the original fast_float C++ library, of course – performance of
74+
which is within noise bounds of this crate). On modern machines, parsing throughput can reach
75+
up to 1GB/s.
76+
77+
In particular, it is faster than Rust standard library's `FromStr::from_str()` by a factor of 2-8x
78+
(larger factor for longer float strings).
79+
80+
While various details regarding the algorithm can be found in the repository for the original
81+
C++ library, here are few brief notes:
82+
83+
- The parser is specialized to work lightning-fast on inputs with at most 19 significant digits
84+
(which constitutes the so called "fast-path"). We believe that most real-life inputs should
85+
fall under this category, and we treat longer inputs as "degenerate" edge cases since it
86+
inevitable causes overflows and loss of precision.
87+
- If the significand happens to be longer than 19 digits, the parser falls back to the "slow path",
88+
in which case its performance roughly matches that of the top Rust/C++ libraries (and still
89+
beats them most of the time, although not by a lot).
90+
- On little-endian systems, there's additional optimizations for numbers with more than 8 digits
91+
after the decimal point.
92+
93+
## Benchmarks
94+
95+
Below is the table of average timings in nanoseconds for parsing a single number
96+
into a 64-bit float.
97+
98+
| | `canada` | `mesh` | `uniform` | `iidi` | `iei` | `rec32` |
99+
| ---------------- | -------- | -------- | --------- | ------ | ------ | ------- |
100+
| fast-float | 22.08 | 11.10 | 20.04 | 40.77 | 26.33 | 29.84 |
101+
| lexical | 61.63 | 25.10 | 53.77 | 72.33 | 53.39 | 72.40 |
102+
| lexical/lossy | 61.51 | 25.24 | 54.00 | 71.30 | 52.87 | 71.71 |
103+
| from_str | 175.07 | 22.58 | 103.00 | 228.78 | 115.76 | 211.13 |
104+
| fast_float (C++) | 22.78 | 10.99 | 20.05 | 41.12 | 27.51 | 30.85 |
105+
| abseil (C++) | 42.66 | 32.88 | 46.01 | 50.83 | 46.33 | 49.95 |
106+
| netlib (C++) | 57.53 | 24.86 | 64.72 | 56.63 | 36.20 | 67.29 |
107+
| strtod (C) | 286.10 | 31.15 | 258.73 | 295.73 | 205.72 | 315.95 |
108+
109+
Parsers:
110+
111+
- `fast-float` - this very crate
112+
- `lexical` – from `lexical_core` crate, v0.7
113+
- `lexical/lossy` - from `lexical_core` crate, v0.7 (lossy parser)
114+
- `from_str` – Rust standard library, `FromStr` trait
115+
- `fast_float (C++)` – original C++ implementation of 'fast-float' method
116+
- `abseil (C++)` – Abseil C++ Common Libraries
117+
- `netlib (C++)` – C++ Network Library
118+
- `strtod (C)` – C standard library
119+
120+
Datasets:
121+
122+
- `canada` – numbers in `canada.txt` file
123+
- `mesh` – numbers in `mesh.txt` file
124+
- `uniform` – uniform random numbers from 0 to 1
125+
- `iidi` – random numbers of format `%d%d.%d`
126+
- `iei` – random numbers of format `%de%d`
127+
- `rec32` – reciprocals of random 32-bit integers
128+
129+
Notes:
130+
131+
- Test environment: macOS 10.14.6, clang 11.0, Rust 1.49, 3.5 GHz i7-4771 Haswell.
132+
- The two test files referred above can be found in
133+
[this](https://github.com/lemire/simple_fastfloat_benchmark) repository.
134+
- The Rust part of the table (along with a few other benchmarks) can be generated via
135+
the benchmark tool that can be found under `extras/simple-bench` of this repo.
136+
- The C/C++ part of the table (along with a few other benchmarks and parsers) can be
137+
generated via a C++ utility that can be found in [this](https://github.com/lemire/simple_fastfloat_benchmark)
138+
repository.
10139

11140
<br>
12141

extras/data-tests/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ name = "fast-float-data-tests"
33
version = "0.1.0"
44
authors = ["Ivan Smirnov <[email protected]>"]
55
edition = "2018"
6+
readme = "README.md"
7+
license = "MIT OR Apache-2.0"
68
publish = false
79

810
[dependencies]

extras/data-tests/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
This crate allows running the test based on files with test cases stored in the
2+
standardized format (credit to Daniel Lemire and Nigel Tao for the test cases).
3+
The test data is sourced from [this](https://github.com/lemire/fast_float_supplemental_tests)
4+
repository which is used for the original fast_float C++ library tests.
5+
6+
Test data files can be found under `ext/data`.
7+
8+
To run the tests:
9+
10+
```sh
11+
cargo run --release
12+
```

extras/simple-bench/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ name = "fast-float-simple-bench"
33
version = "0.1.0"
44
authors = ["Ivan Smirnov <[email protected]>"]
55
edition = "2018"
6+
readme = "README.md"
7+
license = "MIT OR Apache-2.0"
68
publish = false
79

810
[dependencies]

extras/simple-bench/README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
This crate provides a utility for benchmarking the `fast-float` crate against
2+
`lexical_core` and standard library's `FromStr`.
3+
4+
To run a file-based test:
5+
6+
```sh
7+
cargo run --release -- file ext/canada.txt
8+
```
9+
10+
There's two files used in benchmarking of the original fast_float C++ library
11+
(canada.txt and mesh.txt), they are sourced from
12+
[this](https://github.com/lemire/simple_fastfloat_benchmark) repository. These
13+
files can be found under `ext/data`.
14+
15+
To run a randomized test:
16+
17+
```sh
18+
cargo run --release -- random uniform
19+
```
20+
21+
For more details and options (choosing a different random generator, storing
22+
randomized inputs to a file, changing the number of runs, or switching between
23+
32-bit and 64-bit floats), refer to help:
24+
25+
```
26+
cargo run --release -- --help
27+
```

extras/simple-bench/src/main.rs

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ use std::str::FromStr;
77
use std::time::Instant;
88

99
use fastrand::Rng;
10-
use lexical::FromLexical;
10+
use lexical::{FromLexical, FromLexicalLossy};
1111
use structopt::StructOpt;
1212

1313
use fast_float::FastFloat;
@@ -87,7 +87,7 @@ fn run_one_bench<T: FastFloat, F: Fn(&str) -> T>(
8787
BenchResult { name, times }
8888
}
8989

90-
fn run_all_benches<T: FastFloat + FromLexical + FromStr>(
90+
fn run_all_benches<T: FastFloat + FromLexical + FromLexicalLossy + FromStr>(
9191
inputs: &[String],
9292
repeat: usize,
9393
) -> Vec<BenchResult> {
@@ -99,12 +99,19 @@ fn run_all_benches<T: FastFloat + FromLexical + FromStr>(
9999
.unwrap_or_default()
100100
.0
101101
};
102-
let lex_res = run_one_bench("lexical_core", inputs, repeat, lex_func);
102+
let lex_res = run_one_bench("lexical", inputs, repeat, lex_func);
103+
104+
let lexl_func = |s: &str| {
105+
lexical_core::parse_partial_lossy::<T>(s.as_bytes())
106+
.unwrap_or_default()
107+
.0
108+
};
109+
let lexl_res = run_one_bench("lexical/lossy", inputs, repeat, lexl_func);
103110

104111
let std_func = |s: &str| s.parse::<T>().unwrap_or_default();
105112
let std_res = run_one_bench("from_str", inputs, repeat, std_func);
106113

107-
vec![ff_res, lex_res, std_res]
114+
vec![ff_res, lex_res, lexl_res, std_res]
108115
}
109116

110117
fn print_report(inputs: &[String], results: &[BenchResult], inputs_name: &str, ty: &str) {

src/common.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ impl<'a> AsciiStr<'a> {
9999

100100
#[inline]
101101
pub fn offset_from(&self, other: &Self) -> isize {
102-
unsafe { self.ptr.offset_from(other.ptr) } // assuming the same end
102+
isize::wrapping_sub(self.ptr as _, other.ptr as _) // assuming the same end
103103
}
104104
}
105105

src/decimal.rs

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1+
use core::fmt::{self, Debug};
2+
13
use crate::common::{is_8digits_le, parse_digits, ByteSlice};
24

3-
#[derive(Debug, Clone, PartialEq, Eq)]
5+
#[derive(Clone)]
46
pub struct Decimal {
57
pub num_digits: usize,
68
pub decimal_point: i32,
@@ -9,6 +11,30 @@ pub struct Decimal {
911
pub digits: [u8; Self::MAX_DIGITS],
1012
}
1113

14+
impl Debug for Decimal {
15+
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
16+
f.debug_struct("Decimal")
17+
.field("num_digits", &self.num_digits)
18+
.field("decimal_point", &self.decimal_point)
19+
.field("negative", &self.negative)
20+
.field("truncated", &self.truncated)
21+
.field("digits", &(&self.digits[..self.num_digits]))
22+
.finish()
23+
}
24+
}
25+
26+
impl PartialEq for Decimal {
27+
fn eq(&self, other: &Self) -> bool {
28+
self.num_digits == other.num_digits
29+
&& self.decimal_point == other.decimal_point
30+
&& self.negative == other.negative
31+
&& self.truncated == other.truncated
32+
&& &self.digits[..] == &other.digits[..]
33+
}
34+
}
35+
36+
impl Eq for Decimal {}
37+
1238
impl Default for Decimal {
1339
fn default() -> Self {
1440
Self {
@@ -46,7 +72,7 @@ impl Decimal {
4672
if self.num_digits == 0 || self.decimal_point < 0 {
4773
return 0;
4874
} else if self.decimal_point > 18 {
49-
return u64::MAX;
75+
return 0xFFFF_FFFF_FFFF_FFFF_u64;
5076
}
5177
let dp = self.decimal_point as usize;
5278
let mut n = 0_u64;

src/float.rs

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,10 @@ pub trait Float:
4646
impl private::Sealed for f32 {}
4747

4848
impl Float for f32 {
49-
const INFINITY: Self = Self::INFINITY;
50-
const NEG_INFINITY: Self = Self::NEG_INFINITY;
51-
const NAN: Self = Self::NAN;
52-
const NEG_NAN: Self = -Self::NAN;
49+
const INFINITY: Self = core::f32::INFINITY;
50+
const NEG_INFINITY: Self = core::f32::NEG_INFINITY;
51+
const NAN: Self = core::f32::NAN;
52+
const NEG_NAN: Self = -core::f32::NAN;
5353

5454
const MANTISSA_EXPLICIT_BITS: usize = 23;
5555
const MIN_EXPONENT_ROUND_TO_EVEN: i32 = -17;
@@ -78,10 +78,10 @@ impl Float for f32 {
7878
impl private::Sealed for f64 {}
7979

8080
impl Float for f64 {
81-
const INFINITY: Self = Self::INFINITY;
82-
const NEG_INFINITY: Self = Self::NEG_INFINITY;
83-
const NAN: Self = Self::NAN;
84-
const NEG_NAN: Self = -Self::NAN;
81+
const INFINITY: Self = core::f64::INFINITY;
82+
const NEG_INFINITY: Self = core::f64::NEG_INFINITY;
83+
const NAN: Self = core::f64::NAN;
84+
const NEG_NAN: Self = -core::f64::NAN;
8585

8686
const MANTISSA_EXPLICIT_BITS: usize = 52;
8787
const MIN_EXPONENT_ROUND_TO_EVEN: i32 = -4;

0 commit comments

Comments
 (0)