|
| 1 | +# Cassandra StormForge Example |
| 2 | +Optimizing Cassandra for Cost and Performance using cassandra-stress |
| 3 | + |
| 4 | +## Overview |
| 5 | +As companies start using containerized versions of Cassandra, it can be challenging to tune the environment Cassandra is operating in |
| 6 | +for things like HEAP size, CPU, Memory, etc. Due to this challenge, companies are scaling their infrastucture out to keep up with |
| 7 | +the demand of Cassandras increasing resource demand in order to remain stable. In this example we show how to use cassandra-stress, the |
| 8 | +apache cassandra load testing utility to run all three stress tests consecutively - Write, Read, Mixed. |
| 9 | + |
| 10 | +The official documentation for cassandra-stress can be found [here](https://cassandra.apache.org/doc/latest/tools/cassandra_stress.html) |
| 11 | + |
| 12 | + |
| 13 | +### Technical Process |
| 14 | +In order to get the cassandra-stress to run all three load tests under one experiment trial, we needed to create a container for that task. |
| 15 | +You can find the Dockerfile and related artifacts [here](https://www.github.com/thecrudge/cstress) or in the Docker folder. Essentially its |
| 16 | +an image that runs an entrypoint with a very basic script to run all three load tests consecutively. You can customize your load test parameters |
| 17 | +here in the entrypoint.sh file. |
| 18 | + |
| 19 | +In the experiment spec, you can see the parameters we are using for our experiment, and the experiment budget (or how many trials we want to run) - |
| 20 | + |
| 21 | +``` |
| 22 | +spec: |
| 23 | + optimization: |
| 24 | + - name: "experimentBudget" |
| 25 | + value: "120" #number of trials |
| 26 | + parameters: |
| 27 | + - name: memory |
| 28 | + min: 500 |
| 29 | + max: 12000 |
| 30 | + - name: cpu |
| 31 | + min: 500 |
| 32 | + max: 3000 |
| 33 | + - name: MAX_HEAP_SIZE |
| 34 | + min: 1000 |
| 35 | + max: 8000 |
| 36 | +``` |
| 37 | +It is important to remember here to leave some headroom for the max config so not to run into OOM or resource issues during the trial. Here |
| 38 | +I am running Cassandra in AWS on ec2, t2.xlarge nodes. |
| 39 | + |
| 40 | +Because we never want our HEAP size to be greater than our memory setting, we can configure this in our experiment file by declaring constraints |
| 41 | +like so - |
| 42 | + |
| 43 | +``` |
| 44 | + constraints: |
| 45 | + - order: |
| 46 | + lowerParameter: MAX_HEAP_SIZE |
| 47 | + upperParameter: memory |
| 48 | +``` |
| 49 | + |
| 50 | +You can also see that we did the same thing here, but defined them in a different way so that MAX_HEAP_SIZE remains 1500M below memory. |
| 51 | + |
| 52 | +You can find documentation on constraints [here](https://docs.stormforge.io/experiment/parameters/#parameter-constraints) |
| 53 | + |
| 54 | +``` |
| 55 | + constraints: |
| 56 | + - name: heap_memory |
| 57 | + isUpperBound: true |
| 58 | + bound: "-1500" |
| 59 | + constraintType: sum |
| 60 | + parameters: |
| 61 | + - parameterName: memory |
| 62 | + weight: "-1.0" |
| 63 | + - parameterName: MAX_HEAP_SIZE |
| 64 | + weight: "1.0" |
| 65 | +``` |
| 66 | + |
| 67 | +Next, we need to define our metrics or objectives we are optimizing for - |
| 68 | + |
| 69 | +``` |
| 70 | + metrics: |
| 71 | + - name: duration |
| 72 | + minimize: true |
| 73 | + query: "{{duration .StartTime .CompletionTime}}" |
| 74 | + - name: cost |
| 75 | + minimize: true |
| 76 | + query: "{{div (add (mul .Values.cpu 22) (mul .Values.memory 3)) 1000}}" |
| 77 | +``` |
| 78 | + |
| 79 | +In this example, duration is equal to the amount of time it takes for the cassandra-stress job to complete, and the cost is measured by the |
| 80 | +amount of CPU and Memory we are consuming in that trial. |
| 81 | + |
| 82 | +Finally, we define our patches and our trial template |
| 83 | + |
| 84 | +``` |
| 85 | + patch: | |
| 86 | + spec: |
| 87 | + template: |
| 88 | + spec: |
| 89 | + containers: |
| 90 | + - name: cassandra |
| 91 | + resources: |
| 92 | + limits: |
| 93 | + cpu: "{{ .Values.cpu }}m" |
| 94 | + memory: "{{ .Values.memory }}Mi" |
| 95 | + requests: |
| 96 | + cpu: "{{ .Values.cpu }}m" |
| 97 | + memory: "{{ .Values.memory }}Mi" |
| 98 | + env: |
| 99 | + - name: MAX_HEAP_SIZE |
| 100 | + value: "{{ .Values.MAX_HEAP_SIZE }}M" |
| 101 | +
|
| 102 | + template: # trial |
| 103 | + spec: |
| 104 | + initialDelaySeconds: 15 |
| 105 | + template: # job |
| 106 | + spec: |
| 107 | + template: # pod |
| 108 | + spec: |
| 109 | + containers: |
| 110 | + - image: thecrudge/cstress:latest |
| 111 | + name: cassandra-stress |
| 112 | +``` |
| 113 | + |
| 114 | +You can see here how we are patching the cassandra containers for limits and env variables for HEAP sizing. You can also see here that we are |
| 115 | +using the custom cassandra-stress image we discussed at the beginning of this file. We can validate our trial patch, by descibing a cassandra pod |
| 116 | +and verifying the trial settings by describing the trial - |
| 117 | + |
| 118 | +``` |
| 119 | +kubectl describe pod cassandra-0 |
| 120 | +Name: cassandra-0 |
| 121 | +... |
| 122 | +Containers: |
| 123 | + cassandra: |
| 124 | + Container ID: docker://835392cb704e7a01c8011c4d69f7b014159a2b3847809f9074689b905f44596e |
| 125 | + Image: gcr.io/google-samples/cassandra:v13 |
| 126 | + Image ID: docker-pullable://gcr.io/google-samples/cassandra@sha256:7a3d20afa0a46ed073a5c587b4f37e21fa860e83c60b9c42fec1e1e739d64007 |
| 127 | + Ports: 7000/TCP, 7001/TCP, 7199/TCP, 9042/TCP |
| 128 | + Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP |
| 129 | + State: Running |
| 130 | + Started: Wed, 02 Jun 2021 10:54:42 -0500 |
| 131 | + Ready: True |
| 132 | + Restart Count: 0 |
| 133 | + Limits: |
| 134 | + cpu: 618m |
| 135 | + memory: 5049Mi |
| 136 | + Requests: |
| 137 | + cpu: 618m |
| 138 | + memory: 5049Mi |
| 139 | + Readiness: exec [/bin/bash -c /ready-probe.sh] delay=15s timeout=5s period=10s #success=1 #failure=3 |
| 140 | + Environment: |
| 141 | + MAX_HEAP_SIZE: 1413M |
| 142 | + HEAP_NEW_SIZE: 7514M |
| 143 | + CASSANDRA_SEEDS: cassandra-0.cassandra.default.svc.cluster.local |
| 144 | + CASSANDRA_CLUSTER_NAME: K8Demo |
| 145 | + CASSANDRA_DC: DC1-K8Demo |
| 146 | + CASSANDRA_RACK: Rack1-K8Demo |
| 147 | + POD_IP: (v1:status.podIP) |
| 148 | +... |
| 149 | +``` |
| 150 | +``` |
| 151 | +kubectl get trials -w |
| 152 | +
|
| 153 | +NAME STATUS ASSIGNMENTS VALUES |
| 154 | +cassandra-write-read-mixed-example-000 Completed MAX_HEAP_SIZE=5186, cpu=2309, memory=6622 duration=3411, cost=70 |
| 155 | +cassandra-write-read-mixed-example-001 Running MAX_HEAP_SIZE=1413, cpu=618, memory=5049 |
| 156 | +``` |
| 157 | + |
| 158 | +## Results |
| 159 | +The image below shows us that the machine learning has recommended trial number #98. With this trial we can see we have a cost savings of 34.29% |
| 160 | +compared to our baseline in Trial #1. |
| 161 | + |
| 162 | +<img src="img/results1.png" width="400"> |
| 163 | + |
| 164 | +In this image, we can see all of our trials, with the recommended trial highlighted. |
| 165 | + |
| 166 | +<img src="img/results2.png" width="400"> |
| 167 | + |
| 168 | +And finally, we can get the parameter settings or export the config itself |
| 169 | + |
| 170 | +<img src="img/results3.png" width="400"> |
0 commit comments