Workloads

This benchmark allows studying different performance aspects of distributed stream processing systems. For each of these we developed separate workloads, which we describe here.

Runbooks

In the following, we give a step-by-step runbook for each of the workloads to clarify. These steps are similar to what is executed in the deployment scripts for these workloads.

Runbook for workload for latency measurement

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]

  For each of the pipeline complexities [ingest, parse, join, tumbling window, sliding window]:

		- Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
		- Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
		- Start up the metrics exporter.
		- Start up the processing job.
		- Start the data stream generator.
		- Wait for 40 minutes: 30 minutes to process the data and 10 minutes to catch up with possible lags.
		- Stop the data stream generator, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

The deployment script for this workload can be found here:

Flink
Kafka Streams
Spark Streaming
Structured Streaming

Runbook for workload for sustainable throughput measurement

For each of the frameworks [Spark Streaming (3 seconds and 5 seconds micro-batch intervals), Flink, Kafka Streams and Structured Streaming]:

	For each of the base pipeline complexities [ingest, parse, join, tumbling window, sliding window]:

		 For a list of different throughput levels:

			- Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
			- Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
			- Start up the metrics exporter.
			- Start up the processing job.
			- Start the input stream producer.
			- Wait for 40 minutes: 30 minutes to process the data and 10 minutes to catch up with possible lags.  
			- Stop the input stream producer, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

The deployment script for this workload can be found here:

Flink
Kafka Streams
Spark Streaming
Structured Streaming

Runbook for workload for burst at startup

For each of the frameworks [Spark Streaming (3 seconds and 5 seconds micro-batch intervals), Flink, Kafka Streams and Structured Streaming]:

	For each of the base pipeline complexities [ingest, parse, join, tumbling window, sliding window]:

		- Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
		- Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
		- Start the input stream producer and let it publish for 5 minutes.
		- Start up the metrics exporter.
		- Start up the processing job
		- Wait for 10 minutes. The processing job will catch up with the five minute delay and then continues processing the newly incoming data.
		- Stop the input stream producer, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

The deployment script for this workload can be found here:

Flink
Kafka Streams
Spark Streaming
Structured Streaming

Runbook for workload with periodic bursts

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]

  For each of the base pipeline complexities [ingest, parse, join, tumbling window, sliding window]

		- Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
		- Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
		- Start up the metrics exporter.
		- Start up the processing job.
		- Start the input stream producer with periodic bursts.
		- Wait for 40 minutes: 30 minutes to process the data and 10 minutes to catch up with possible lags.
		- Stop the input stream producer, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

The deployment script for this workload can be found here:

Flink
Kafka Streams
Spark Streaming
Structured Streaming

Runbook for scalability workload

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]

  For a list of different pipelines, throughput levels, and cluster sizes:

   - Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
   - Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
   - Start up the metrics exporter.
   - Start up the processing job.
   - Start the input stream producer.
   - Wait for 40 minutes: 30 minutes to process the data and 10 minutes to catch up with possible lags.  
   - Stop the input stream producer, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

Runbook for master failure workload

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]

  For a high-available and single master setup:

   - Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
   - Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
   - Start up the metrics exporter.
   - Start up the processing job.
   - Start the input stream producer.
   - Wait for 10 minutes.
   - Kill the master.
   - Wait for 5 minutes.
   - Stop the input stream producer, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

Runbook for worker failure workload

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]

  For a list of different processing semantics:

   - Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
   - Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
   - Start up the metrics exporter.
   - Start up the processing job.
   - Start the input stream producer.
   - Wait for 7 minutes.
   - Kill one of the workers.
   - Wait for 7 minutes.
   - Stop the input stream producer, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

Runbook for job failure workload

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]
   - Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
   - Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
   - Start up the metrics exporter.
   - Start up the processing job.
   - Start the input stream producer.
   - Wait for 15 minutes: the faulty event is send in the middle of execution.
   - Stop the input stream producer, metrics exporter and streaming job.
   Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
   Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

This work has been made possible by Klarrio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workloads

Runbooks

Runbook for workload for latency measurement

Runbook for workload for sustainable throughput measurement

Runbook for workload for burst at startup

Runbook for workload with periodic bursts

Runbook for scalability workload

Runbook for master failure workload

Runbook for worker failure workload

Runbook for job failure workload

Clone this wiki locally