Skip to content

JakeDern/eventson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

While this crate is in version < 1 there might be many changes to the API. I don't anticipate being in this state for long and plan to get to a 1.0 quickly after some initial feedback.

eventson

eventson is a simple event based JSON parser with excellent performance. It operates over any std::io::Read instance and returns borrowed values from a single reusable buffer, pulling new bits from the read instance as-needed.

The event-based nature means that parsing can happen incrementally over any stream of bytes without a requirement to convert to a str or buffer all of the input ahead of time.

Other than allocating a buffer for the data and a stack for keeping parse state, this crate does not allocate. The sizes for both are configurable and allocation happens on construction.

Why use this crate?

Here are some reasons you want might use this crate:

  • You want a JSON parser that allows for tight control over memory usage
  • You do not want to buffer all of your input in memory before parsing
  • Your input is relatively simple to process
  • You want a parser that can handle regular JSON or JSONL
  • You know a lot about your input

Why not use this crate?

  • Working with JSON events can be a lot of extra complexity vs more traditional parsers
  • You don't know much about your input including what a reasonable maximum token size or maximum stack depth would be

Usage

Users have to worry about two things up front:

  1. The maximum size of a token in their JSON
  2. The maximum depth of the state stack

Currently it is not possible to allow growing the stack or the buffer, so these must be set ahead of time. Once constructed, eventson will parse anything that fits within the constraints.

See the examples folder for some sample code.

Benchmarks

Benchmarking is hard, I'm not an expert. I'm sure I made a mistake somehwere and I recognize that it's hard to do a true apples to apples comparison. A major thing that this crate does not do is to validate utf-8 for strings and the other crates all do.

That said, I think this crate is pretty fast. It outperformed every other library that I tested including serde_json, simd_json, and serde_json_borrow by a good margin.

I used four test datasets with different characteristics and simply parsed all of the values. Full details on the datasets used can be found in the code in the benches folder.

All benchmarks were performed on a desktop with an AMD Ryzen 9 7950X3D 16-Core Processor on WSL2 running ubuntu 22.04.05 LTS. This processor does support SIMD instructions including AVX512.

$ cargo bench parser
parser                                  fastest       | slowest       | median        | mean          | samples | iters
|- bench_eventson                                     |               |               |               |         |
|  |- data/other/canada.json            2.676 ms      | 2.918 ms      | 2.74 ms       | 2.742 ms      | 100     | 100
|  |                                    841 MB/s      | 771.4 MB/s    | 821.5 MB/s    | 820.6 MB/s    |         |
|  |- data/other/catalan_events.json    1.403 ms      | 1.525 ms      | 1.456 ms      | 1.457 ms      | 100     | 100
|  |                                    1.159 GB/s    | 1.067 GB/s    | 1.117 GB/s    | 1.116 GB/s    |         |
|  |- data/other/citm_catalog.json      1.159 ms      | 1.773 ms      | 1.239 ms      | 1.25 ms       | 100     | 100
|  |                                    1.489 GB/s    | 974.1 MB/s    | 1.393 GB/s    | 1.38 GB/s     |         |
|  \- data/other/twitter.json           371.7 us      | 531.2 us      | 416.9 us      | 417.7 us      | 100     | 100
|                                       1.698 GB/s    | 1.188 GB/s    | 1.514 GB/s    | 1.511 GB/s    |         |
|- bench_serde_json_borrow                            |               |               |               |         |
|  |- data/other/canada.json            9.998 ms      | 11.95 ms      | 10.37 ms      | 10.44 ms      | 100     | 100
|  |                                    225.1 MB/s    | 188.2 MB/s    | 217 MB/s      | 215.6 MB/s    |         |
|  |- data/other/catalan_events.json    1.81 ms       | 3.122 ms      | 1.851 ms      | 1.873 ms      | 100     | 100
|  |                                    898.8 MB/s    | 521.3 MB/s    | 879.1 MB/s    | 868.9 MB/s    |         |
|  |- data/other/citm_catalog.json      1.564 ms      | 1.757 ms      | 1.6 ms        | 1.605 ms      | 100     | 100
|  |                                    1.103 GB/s    | 983 MB/s      | 1.079 GB/s    | 1.075 GB/s    |         |
|  \- data/other/twitter.json           551.1 us      | 760.3 us      | 588 us        | 592.5 us      | 100     | 100
|                                       1.145 GB/s    | 830.5 MB/s    | 1.073 GB/s    | 1.065 GB/s    |         |
|- bench_serde_json_borrow_from_string                |               |               |               |         |
|  |- data/other/canada.json            7.985 ms      | 9.092 ms      | 8.319 ms      | 8.35 ms       | 100     | 100
|  |                                    281.8 MB/s    | 247.5 MB/s    | 270.5 MB/s    | 269.5 MB/s    |         |
|  |- data/other/catalan_events.json    1.742 ms      | 1.882 ms      | 1.785 ms      | 1.793 ms      | 100     | 100
|  |                                    934.2 MB/s    | 864.8 MB/s    | 911.7 MB/s    | 907.3 MB/s    |         |
|  |- data/other/citm_catalog.json      1.538 ms      | 1.684 ms      | 1.576 ms      | 1.578 ms      | 100     | 100
|  |                                    1.122 GB/s    | 1.025 GB/s    | 1.095 GB/s    | 1.094 GB/s    |         |
|  \- data/other/twitter.json           486.3 us      | 595.9 us      | 517.5 us      | 518.6 us      | 100     | 100
|                                       1.298 GB/s    | 1.059 GB/s    | 1.22 GB/s     | 1.217 GB/s    |         |
|- bench_serde_value                                  |               |               |               |         |
|  |- data/other/canada.json            8.938 ms      | 10.06 ms      | 9.289 ms      | 9.331 ms      | 100     | 100
|  |                                    251.8 MB/s    | 223.6 MB/s    | 242.3 MB/s    | 241.2 MB/s    |         |
|  |- data/other/catalan_events.json    3.438 ms      | 4.059 ms      | 3.738 ms      | 3.702 ms      | 100     | 100
|  |                                    473.3 MB/s    | 400.9 MB/s    | 435.3 MB/s    | 439.5 MB/s    |         |
|  |- data/other/citm_catalog.json      4.92 ms       | 5.939 ms      | 5.157 ms      | 5.205 ms      | 100     | 100
|  |                                    351 MB/s      | 290.7 MB/s    | 334.8 MB/s    | 331.8 MB/s    |         |
|  \- data/other/twitter.json           1.52 ms       | 1.758 ms      | 1.605 ms      | 1.611 ms      | 100     | 100
|                                       415.2 MB/s    | 359 MB/s      | 393.2 MB/s    | 391.7 MB/s    |         |
\- bench_simd_json                                    |               |               |               |         |
   |- data/other/canada.json            8.717 ms      | 12 ms         | 9.017 ms      | 9.06 ms       | 100     | 100
   |                                    258.2 MB/s    | 187.5 MB/s    | 249.6 MB/s    | 248.4 MB/s    |         |
   |- data/other/catalan_events.json    2.638 ms      | 2.971 ms      | 2.713 ms      | 2.724 ms      | 100     | 100
   |                                    616.8 MB/s    | 547.7 MB/s    | 599.9 MB/s    | 597.4 MB/s    |         |
   |- data/other/citm_catalog.json      1.773 ms      | 2.393 ms      | 1.84 ms       | 1.914 ms      | 100     | 100
   |                                    973.9 MB/s    | 721.5 MB/s    | 938.3 MB/s    | 902.3 MB/s    |         |
   \- data/other/twitter.json           579.6 us      | 722.4 us      | 597.9 us      | 603.2 us      | 100     | 100
                                        1.089 GB/s    | 874.1 MB/s    | 1.056 GB/s    | 1.046 GB/s    |         |

Tests

TODO: Write about testing methods.

Thank You!

I just want to say thank you to the authors of these resources!

These resources were invaluable when writing this crate, whether that be for code snippets or learning things about JSON parsing, rust, or just programming in general.

About

Simple event-based JSON parser with excellent performance

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages