Workloads

Jump to bottom

gvdongen edited this page Aug 10, 2021 · 12 revisions

This benchmark allows studying different performance aspects of distributed stream processing systems. For each of these we developed separate workloads, which we describe here.

Runbooks

In the following, we give a step-by-step runbook for each of the workloads to clarify. These steps are similar to what is executed in the deployment scripts for these workloads.

Runbook for workload for latency measurement

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]

  For each of the pipeline complexities [ingest, parse, join, tumbling window, sliding window]:

		- Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
		- Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
		- Start up the metrics exporter.
		- Start up the processing job.
		- Start the data stream generator.
		- Wait for 40 minutes: 30 minutes to process the data and 10 minutes to catch up with possible lags.
		- Stop the data stream generator, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

The deployment script for this workload can be found here:

Runbook for workload for burst at startup

additions

For each of the frameworks [Spark Streaming (3 seconds and 5 seconds micro-batch intervals), Flink, Kafka Streams and Structured Streaming]:

	For each of the base pipeline complexities [ingest, parse, join, tumbling window, sliding window]:

		- Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
		- Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
		- Start the input stream producer and let it publish for 5 minutes.
		- Start up the metrics exporter.
		- Start up the processing job
		- Wait for 10 minutes. The processing job will catch up with the five minute delay and then continues processing the newly incoming data.
		- Stop the input stream producer, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

The deployment script for this workload can be found here:

Runbook for workload with periodic bursts

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]

  For each of the base pipeline complexities [ingest, parse, join, tumbling window, sliding window]

		- Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
		- Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
		- Start up the metrics exporter.
		- Start up the processing job.
		- Start the input stream producer with periodic bursts.
		- Wait for 40 minutes: 30 minutes to process the data and 10 minutes to catch up with possible lags.
		- Stop the input stream producer, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

The deployment script for this workload can be found here:

Runbook for sustainable throughput measurement and scalability workload

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]

  For a list of different pipelines, throughput levels, and cluster sizes:

   - Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
   - Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
   - Start up the metrics exporter.
   - Start up the processing job.
   - Start the input stream producer.
   - Wait for 40 minutes: 30 minutes to process the data and 10 minutes to catch up with possible lags.  
   - Stop the input stream producer, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

The deployment script for this workload can be found here:

Runbook for master failure workload

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]

  For a high-available and single master setup:

   - Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
   - Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
   - Start up the metrics exporter.
   - Start up the processing job.
   - Start the input stream producer.
   - Wait for 10 minutes.
   - Kill the master.
   - Wait for 5 minutes.
   - Stop the input stream producer, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

The deployment script for this workload can be found here:

Flink
Spark Streaming: master failure and driver failure
Structured Streaming: master failure and driver failure

Runbook for worker failure workload

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]

  For a list of different processing semantics:

   - Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
   - Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
   - Start up the metrics exporter.
   - Start up the processing job.
   - Start the input stream producer.
   - Wait for 7 minutes.
   - Kill one of the workers.
   - Wait for 7 minutes.
   - Stop the input stream producer, metrics exporter and streaming job.

  Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
  Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

The deployment script for this workload can be found here:

Runbook for job failure workload

For each of the frameworks [Spark Streaming, Flink, Kafka Streams and Structured Streaming]
   - Start cluster of the framework, if it requires one and wait a few minutes to complete startup.
   - Create a new Kafka output topic for the results to be published on and a topic for the JMX metrics to be published on.
   - Start up the metrics exporter.
   - Start up the processing job.
   - Start the input stream producer.
   - Wait for 15 minutes: the faulty event is send in the middle of execution.
   - Stop the input stream producer, metrics exporter and streaming job.
   Start a job to consume the output and metrics from Kafka and write it to S3 (output-consumer).
   Start a job to evaluate the output, JMX metrics and cAdvisor metrics (evaluator).

The deployment script for this workload can be found here:

This work has been made possible by Klarrio