This is a Rust command line tool that calculates a histogram of the separate types of JSON records in an input JSON log file (one JSON object per line).
A sample input file would be:
{"type":"B","foo":"bar","items":["one","two"]}
{"type": "A","foo": 4.0 }
{"type": "B","bar": "abcd"}The output histogram would report a count of 2 for type B and 1 for type A. It would also report total of 73 bytes for type B and 26 for type A.
Git clone:
git clone https://github.com/dimitarvp/json-log-histogram-rust.git
cd json-log-histogram-rustCompile:
RUSTFLAGS="-C target-cpu=native" cargo build --releaseTo test, generate a JSON log file and supply it as a command-line parameter:
./target/release/jlh -f /path/to/json/log/fileThe tool prints an aligned text table and a total runtime at the bottom.
| CPU | File size | Time in seconds |
|---|---|---|
| Xeon W-2150B @ 3.00GHz | 1MB | 0.11091947 |
| Xeon W-2150B @ 3.00GHz | 10MB | 0.62043929 |
| Xeon W-2150B @ 3.00GHz | 100MB | 0.643637170 |
| Xeon W-2150B @ 3.00GHz | 1000MB | 5.175781744 |
| i7-4870HQ @ 2.50GHz | 1MB | 0.07234297 |
| i7-4870HQ @ 2.50GHz | 10MB | 0.68889124 |
| i7-4870HQ @ 2.50GHz | 100MB | 0.670027735 |
| i7-4870HQ @ 2.50GHz | 1000MB | 6.659739416 |
| i3-3217U @ 1.80GHz | 1MB | 0.14369994 |
| i3-3217U @ 1.80GHz | 10MB | 0.49248859 |
| i3-3217U @ 1.80GHz | 100MB | 0.535957719 |
| i3-3217U @ 1.80GHz | 1000MB | 3.773678079 |
- Using Rust
1.43.1. - Using the rayon crate for transparent parallelization of the histogram calculation.
- Using the clap crate to parse the command line options (only one, which is the input JSON log file).
- Using the prettytable-rs crate to produce a pretty command line table with the results.
- Using
serde_jsonto read each JSON record to a struct. - Skipped the ability to pipe files to the tool so it can read from stdin. The motivation was that
rayondoes not provide its.par_bridgefunction to polymorphicBox<dyn BufRead>objects (which is the common denominator ofstd::io::stdin().lock()andstd::fs::File.open(path)). I could have probably made it work but after 2 hours of attempts I realized that it might take a long time so I cut it short. - Used the
.lines()function on theBufReadereven though that allocates a newStringper line. I am aware of the betterBufReader.read_lineidiom with a singleStringbuffer (which is cleared after every line is consumed) and my initial non-parallel version even used it -- see this commit. But I couldn't find a quick way to translate this idiom to simply having something with the.lines()function (rayonexpects anIterator). I could have implementedIteratorfor a wrapping struct or enum but, same as above, I was not sure if it will not take me very long. IMO even with that caveat the tool is very fast (see performance results table below). - The commit history got slightly botched because I had to use bfg to remove the 1MB / 10MB / 100MB / 1000MB JSON files that I added earlier (which I replaced with gzipped variants later).