While this crate is in version < 1 there might be many changes to the API. I don't anticipate being in this state for long and plan to get to a 1.0 quickly after some initial feedback.
eventson is a simple event based JSON parser with excellent performance. It operates
over any std::io::Read instance and returns borrowed values from a single
reusable buffer, pulling new bits from the read instance as-needed.
The event-based nature means that parsing can happen incrementally over any stream
of bytes without a requirement to convert to a str or buffer all of the input
ahead of time.
Other than allocating a buffer for the data and a stack for keeping parse state, this crate does not allocate. The sizes for both are configurable and allocation happens on construction.
Here are some reasons you want might use this crate:
- You want a JSON parser that allows for tight control over memory usage
- You do not want to buffer all of your input in memory before parsing
- Your input is relatively simple to process
- You want a parser that can handle regular JSON or JSONL
- You know a lot about your input
- Working with JSON events can be a lot of extra complexity vs more traditional parsers
- You don't know much about your input including what a reasonable maximum token size or maximum stack depth would be
Users have to worry about two things up front:
- The maximum size of a token in their JSON
- The maximum depth of the state stack
Currently it is not possible to allow growing the stack or the buffer, so these must be set ahead of time. Once constructed, eventson will parse anything that fits within the constraints.
See the examples folder for some sample code.
Benchmarking is hard, I'm not an expert. I'm sure I made a mistake somehwere and I recognize that it's hard to do a true apples to apples comparison. A major thing that this crate does not do is to validate utf-8 for strings and the other crates all do.
That said, I think this crate is pretty fast. It outperformed every other library
that I tested including serde_json, simd_json, and serde_json_borrow by a
good margin.
I used four test datasets with different characteristics and simply parsed all of
the values. Full details on the datasets used can be found in the code in the
benches folder.
All benchmarks were performed on a desktop with an AMD Ryzen 9 7950X3D 16-Core Processor
on WSL2 running ubuntu 22.04.05 LTS. This processor does support SIMD instructions
including AVX512.
$ cargo bench parser
parser fastest | slowest | median | mean | samples | iters
|- bench_eventson | | | | |
| |- data/other/canada.json 2.676 ms | 2.918 ms | 2.74 ms | 2.742 ms | 100 | 100
| | 841 MB/s | 771.4 MB/s | 821.5 MB/s | 820.6 MB/s | |
| |- data/other/catalan_events.json 1.403 ms | 1.525 ms | 1.456 ms | 1.457 ms | 100 | 100
| | 1.159 GB/s | 1.067 GB/s | 1.117 GB/s | 1.116 GB/s | |
| |- data/other/citm_catalog.json 1.159 ms | 1.773 ms | 1.239 ms | 1.25 ms | 100 | 100
| | 1.489 GB/s | 974.1 MB/s | 1.393 GB/s | 1.38 GB/s | |
| \- data/other/twitter.json 371.7 us | 531.2 us | 416.9 us | 417.7 us | 100 | 100
| 1.698 GB/s | 1.188 GB/s | 1.514 GB/s | 1.511 GB/s | |
|- bench_serde_json_borrow | | | | |
| |- data/other/canada.json 9.998 ms | 11.95 ms | 10.37 ms | 10.44 ms | 100 | 100
| | 225.1 MB/s | 188.2 MB/s | 217 MB/s | 215.6 MB/s | |
| |- data/other/catalan_events.json 1.81 ms | 3.122 ms | 1.851 ms | 1.873 ms | 100 | 100
| | 898.8 MB/s | 521.3 MB/s | 879.1 MB/s | 868.9 MB/s | |
| |- data/other/citm_catalog.json 1.564 ms | 1.757 ms | 1.6 ms | 1.605 ms | 100 | 100
| | 1.103 GB/s | 983 MB/s | 1.079 GB/s | 1.075 GB/s | |
| \- data/other/twitter.json 551.1 us | 760.3 us | 588 us | 592.5 us | 100 | 100
| 1.145 GB/s | 830.5 MB/s | 1.073 GB/s | 1.065 GB/s | |
|- bench_serde_json_borrow_from_string | | | | |
| |- data/other/canada.json 7.985 ms | 9.092 ms | 8.319 ms | 8.35 ms | 100 | 100
| | 281.8 MB/s | 247.5 MB/s | 270.5 MB/s | 269.5 MB/s | |
| |- data/other/catalan_events.json 1.742 ms | 1.882 ms | 1.785 ms | 1.793 ms | 100 | 100
| | 934.2 MB/s | 864.8 MB/s | 911.7 MB/s | 907.3 MB/s | |
| |- data/other/citm_catalog.json 1.538 ms | 1.684 ms | 1.576 ms | 1.578 ms | 100 | 100
| | 1.122 GB/s | 1.025 GB/s | 1.095 GB/s | 1.094 GB/s | |
| \- data/other/twitter.json 486.3 us | 595.9 us | 517.5 us | 518.6 us | 100 | 100
| 1.298 GB/s | 1.059 GB/s | 1.22 GB/s | 1.217 GB/s | |
|- bench_serde_value | | | | |
| |- data/other/canada.json 8.938 ms | 10.06 ms | 9.289 ms | 9.331 ms | 100 | 100
| | 251.8 MB/s | 223.6 MB/s | 242.3 MB/s | 241.2 MB/s | |
| |- data/other/catalan_events.json 3.438 ms | 4.059 ms | 3.738 ms | 3.702 ms | 100 | 100
| | 473.3 MB/s | 400.9 MB/s | 435.3 MB/s | 439.5 MB/s | |
| |- data/other/citm_catalog.json 4.92 ms | 5.939 ms | 5.157 ms | 5.205 ms | 100 | 100
| | 351 MB/s | 290.7 MB/s | 334.8 MB/s | 331.8 MB/s | |
| \- data/other/twitter.json 1.52 ms | 1.758 ms | 1.605 ms | 1.611 ms | 100 | 100
| 415.2 MB/s | 359 MB/s | 393.2 MB/s | 391.7 MB/s | |
\- bench_simd_json | | | | |
|- data/other/canada.json 8.717 ms | 12 ms | 9.017 ms | 9.06 ms | 100 | 100
| 258.2 MB/s | 187.5 MB/s | 249.6 MB/s | 248.4 MB/s | |
|- data/other/catalan_events.json 2.638 ms | 2.971 ms | 2.713 ms | 2.724 ms | 100 | 100
| 616.8 MB/s | 547.7 MB/s | 599.9 MB/s | 597.4 MB/s | |
|- data/other/citm_catalog.json 1.773 ms | 2.393 ms | 1.84 ms | 1.914 ms | 100 | 100
| 973.9 MB/s | 721.5 MB/s | 938.3 MB/s | 902.3 MB/s | |
\- data/other/twitter.json 579.6 us | 722.4 us | 597.9 us | 603.2 us | 100 | 100
1.089 GB/s | 874.1 MB/s | 1.056 GB/s | 1.046 GB/s | |
TODO: Write about testing methods.
I just want to say thank you to the authors of these resources!
These resources were invaluable when writing this crate, whether that be for code snippets or learning things about JSON parsing, rust, or just programming in general.
- serde-rs/serde_json for two snippets of code and for being the standard I test against:
- AlexHuszagh/rust-lexical Which I use for parsing numbers and has a very handy
parse_partialfeature. - BurntSushi/ripgrep where I learned a lot about how to write a re-usable buffer by studying the code
- miloyip/nativejson-benchmark for JSON test data
- Standford bithacks reference on Mycroft's algorithm and it's generalizations
- legoktm/jsonchecker for JSON test case data
- nst/JSONTestSuite for teaching me a lot about JSON
- jdorfman/awesome-json-datasets for links to lots of JSON datasets