Name		Name	Last commit message	Last commit date
parent directory ..
contrib		contrib
csv		csv
CMakeLists.txt		CMakeLists.txt
PartialCsvParser_bench.cpp		PartialCsvParser_bench.cpp
README.md		README.md
benchmark.hpp		benchmark.hpp
cmdline_options.hpp		cmdline_options.hpp
csv_parser_cplusplus_bench.cpp		csv_parser_cplusplus_bench.cpp

README.md

Table of Contents generated with DocToc

PartialCsvParser benchmark

PartialCsvParser benchmark

Following benchmarks are taken.

Scalability of PartialCsvParser.
Comparison with other CSV parser.
- PartialCsvParser v0.1.1
- csv-parser-cplusplus v1.0.0

Evaluation settings

Parses this CSV file and just counts the total number of columns:

Size: 412 MB
Number of rows: 8,192,000
Number of columns: 5

First 5 rows:

1,Douglas,Sanchez,[email protected],2.221.14.151
2,Robert,Hudson,[email protected],211.131.138.32
3,Victor,Taylor,[email protected],41.251.57.238
4,Kelly,Harvey,[email protected],69.31.229.204
5,John,Adams,[email protected],206.51.26.246

Used 2 environments.

MBA
- Macbook Air, a comodity laptop as you know.
clokoap100
- Middle class server with both SSD and HDD.

More detailed specification follows.

MBA


CPU clock	1.3 GHz
# CPU sockets	1
# Cores per socket	2
# Logical cores per physical core	2

Memory size	8 GB
Memory sequential read	2.142 GB/sec
Memory random read	1.467 GB/sec
Memory file system	hfs

SSD sequential read	492.289 MB/s
SSD random read	11.374 MB/sec
SSD file system	hfs

clokoap100


CPU clock	2.393 GHz
# CPU sockets	2
# Cores per socket	4
# Logical cores per physical core	2

Memory size	24 GB
Memory sequential read	2.194 GB/sec
Memory random read	1.759 GB/sec
Memory file system	tmpfs

SSD sequential read	470.114 MB/s
SSD random read	30.350 MB/sec
SSD file system	ext3

HDD sequential read	261.251 MB/sec
HDD random read	4.191 MB/sec
HDD file system	ext3

Tips: How to check read speed

Used fio to check random/sequential read speed.

$ vim random-read-mem.fio
[random-read]
rw=randread
size=512m
directory=/dev/shm

$ vim sequential-read-mem.fio
[sequential-read]
rw=read
size=512m
directory=/dev/shm

$ fio random-read-mem.fio

Change the directory to access to different devices.

Tips: Memory file system on Mac OSX

Unlike Linux, Mac OSX does not have tmpfs mounted to /dev/shm.

You can create and destroy memory file system by the following way.

$ hdid -nomount ram://2097152  # 512 bytes * 2097152 sectors = 1024 Mbytes
/dev/disk2
$ newfs_hfs /dev/disk2
Initialized /dev/rdisk2 as a 1024 MB case-insensitive HFS Plus volume
$ mkdir /tmp/mnt
$ mount -t hfs /dev/disk2 /tmp/mnt

$ hdiutil detach /dev/disk2

Benchmark results

Comparison with othre libraries

Scalability

Scalability on clokoap100
Scalability on MBA

Raw data is available at Google SpreadSheet.

You can also conduct experiments with your own environments following the instructions below.

Generate benchmark data

$ ../script/generate-benchmark-data.sh 12  # you can use bigger or smaller data
$ ls csv/
20480000col.csv 5000col.csv

Build benchmark executables

Internet connection is necessary because Makefile internally invoke wget to get other libraries to compare the performance with.

$ cmake . && make

Run PartialCsvParser benchmark

With 4 threads, for example.

$ time ./PartialCsvParser_bench -p 4 -c 20480000 -f csv/20480000col.csv
/Users/nakatani.sho/git/PartialCsvParser/benchmark/PartialCsvParser_bench.cpp:50 - 0.0186529 seconds - mmap(2) file
/Users/nakatani.sho/git/PartialCsvParser/benchmark/PartialCsvParser_bench.cpp:73 - 3.97924 seconds - join parsing threads
OK. Parsed 20480000 columns.

real    0m4.010s
user    0m13.796s
sys     0m0.172s

Check the wall-clock time. 4.010 seconds in this execution.

Run csv-parser-cplusplus benchmark

$ time ./csv_parser_cplusplus_bench -c 20480000 -f csv/20480000col.csv
/Users/nakatani.sho/git/PartialCsvParser/benchmark/csv_parser_cplusplus_bench.cpp:42 - 34.0444 seconds - parse
OK. Parsed 20480000 columns.

real    0m34.049s
user    0m33.022s
sys     0m0.498s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark

benchmark

README.md

PartialCsvParser benchmark

Evaluation settings

MBA

clokoap100

Tips: How to check read speed

Tips: Memory file system on Mac OSX

Benchmark results

Comparison with othre libraries

Scalability

Generate benchmark data

Build benchmark executables

Run PartialCsvParser benchmark

Run csv-parser-cplusplus benchmark

Files

benchmark

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmark

Folders and files

parent directory

README.md

PartialCsvParser benchmark

Evaluation settings

MBA

clokoap100

Tips: How to check read speed

Tips: Memory file system on Mac OSX

Benchmark results

Comparison with othre libraries

Scalability

Generate benchmark data

Build benchmark executables

Run PartialCsvParser benchmark

Run csv-parser-cplusplus benchmark