Skip to content

Benchmark

gvdongen edited this page Nov 19, 2020 · 18 revisions

Description

Data

The data source is traffic data from NDW (Nationaal Dataportaal Wegverkeer). We use the Data Stream Generator to generate an input stream on two topics.

Processing Flow

The complete processing pipeline looks as follows:

processing flow

adapted from [1]

The flow of operations done in the benchmark is as follows:

  1. Ingest: Read JSON event data from Kafka from two input streams: speed stream and flow stream.

  2. Parse: Parse the data from the two streams and extract useful fields.

  3. Join: we join the flow stream and the speed stream together. The join will happen based on the key: measurement id + internal id + timestamp. This internal id describes the lane of the road. By joining this we know how many cars passed and how fast they drove on a certain lane at a certain time point.

  4. Tumbling window: we aggregate the speed and the flow over all the lanes belonging to the same measurement ID. So here the data is grouped by measurement ID + timestamp. For the speed, we compute the average speed over all the lanes and for the flow, we compute the accumulated flow over all the lanes. We then know how fast cars drove past the measurement point on average and how many cars passed in the last time period.

  5. Sliding window: For the windowing phase, we will compute the relative change in flow and speed over two lookback period: a short one and a long one. The length of the lookback periods is defined in the benchmarkConfig.conf file. The relative change is calculated by computing:

                       speed(t) - speed(t-1) / speed(t-1)
    
  6. Publishing output back to Kafka

The job can be executed up to any phase of the flow to make the job more

Architecture

architecture

References

[1] van Dongen, G., & Van den Poel, D. (2020). Evaluation of Stream Processing Frameworks. IEEE Transactions on Parallel and Distributed Systems, 31(8), 1845-1858.