Skip to content
This repository was archived by the owner on Jul 16, 2024. It is now read-only.

Commit f35c459

Browse files
authored
Merge pull request #64 from thestormforge/a-new-cassandra-branch
A new cassandra branch
2 parents 9582ff1 + 0695b81 commit f35c459

File tree

11 files changed

+424
-3
lines changed

11 files changed

+424
-3
lines changed

cassandra/Docker/Dockerfile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
FROM ubuntu:bionic
2+
3+
# Install curl, wget, gnupg2
4+
#RUN apt-get update && apt-get --assume-yes install curl gnupg2 wget
5+
RUN apt-get update && apt-get --assume-yes install curl gnupg2 wget
6+
7+
# Add Cassandra repo 3.11
8+
#RUN echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | tee -a /etc/apt/sources.list.d/cassandra.sources.list
9+
#RUN apt-key adv --keyserver pool.sks-keyservers.net --recv-key A278B781FE4B2BDA
10+
#RUN wget https://www.apache.org/dist/cassandra/KEYS && apt-key add KEYS
11+
12+
# Add Cassandra Repo 4.x
13+
RUN echo "deb http://downloads.apache.org/cassandra/debian 311x main" | tee -a /etc/apt/sources.list.d/cassandra.sources.list
14+
RUN curl https://downloads.apache.org/cassandra/KEYS | apt-key add -
15+
16+
# Install Cassandra package
17+
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get --assume-yes install cassandra cassandra-tools
18+
19+
COPY entrypoint.sh /usr/local/bin/
20+
ENTRYPOINT ["entrypoint.sh"]

cassandra/Docker/entrypoint.sh

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
#!/bin/bash
2+
set -e
3+
4+
#cassandra-stress write n=1000000 -rate threads=100 -node cassandra &
5+
#wait $!
6+
7+
#cassandra-stress mixed n=100000 -rate threads=100 -node cassandra &
8+
#wait
9+
10+
#cqlsh --request-timeout=60 -e "DROP KEYSPACE keyspace1;" cassandra || true
11+
#wait
12+
13+
>&1
14+
cqlsh --request-timeout=60 -e "DESCRIBE KEYSPACE keyspace1;" cassandra || true
15+
16+
cqlsh --request-timeout=60 -e "DROP KEYSPACE keyspace1;" cassandra || true
17+
18+
cassandra-stress write n=100000 -rate threads=10 -node cassandra
19+
20+
cassandra-stress mixed n=100000 -rate threads=10 -node cassandra
21+
22+
#if [ $? -eq 0 ]
23+
#then
24+
# echo "Successfully completed ctress test"
25+
#else
26+
# echo "Could not finish stress test successfully" >&2
27+
#fi
28+
29+
#cqlsh --request-timeout=60 -e "DROP KEYSPACE keyspace1;" cassandra || true
30+
31+
#if [ $? -eq 0 ]
32+
#then
33+
# echo "Successfully dropped keyspace keyspace1"
34+
#else
35+
# echo "Could not drop keyspace keyspace1" >&2
36+
#fi
37+
38+
#cqlsh --request-timeout=60 -e "DROP KEYSPACE system;" cassandra || true
39+
40+
#if [ $? -eq 0 ]
41+
#then
42+
# echo "Successfully dropped system keyspace"
43+
#else
44+
# echo "Could not drop system keyspace" >&2
45+
#fi
46+
47+
## NOTES - rm -rf /cassandra_data/data/system/peers*/*
48+
## DO AS INIT CONTAINER ON STS
49+
## $env:JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"

cassandra/README.md

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
# Cassandra StormForge Example
2+
Optimizing Cassandra for Cost and Performance using cassandra-stress
3+
4+
## Overview
5+
As companies start using containerized versions of Cassandra, it can be challenging to tune the environment Cassandra is operating in
6+
for things like HEAP size, CPU, Memory, etc. Due to this challenge, companies are scaling their infrastucture out to keep up with
7+
the demand of Cassandras increasing resource demand in order to remain stable. In this example we show how to use cassandra-stress, the
8+
apache cassandra load testing utility to run all three stress tests consecutively - Write, Read, Mixed.
9+
10+
The official documentation for cassandra-stress can be found [here](https://cassandra.apache.org/doc/latest/tools/cassandra_stress.html)
11+
12+
13+
### Technical Process
14+
In order to get the cassandra-stress to run all three load tests under one experiment trial, we needed to create a container for that task.
15+
You can find the Dockerfile and related artifacts [here](https://www.github.com/thecrudge/cstress) or in the Docker folder. Essentially its
16+
an image that runs an entrypoint with a very basic script to run all three load tests consecutively. You can customize your load test parameters
17+
here in the entrypoint.sh file.
18+
19+
In the experiment spec, you can see the parameters we are using for our experiment, and the experiment budget (or how many trials we want to run) -
20+
21+
```
22+
spec:
23+
optimization:
24+
- name: "experimentBudget"
25+
value: "120" #number of trials
26+
parameters:
27+
- name: memory
28+
min: 500
29+
max: 12000
30+
- name: cpu
31+
min: 500
32+
max: 3000
33+
- name: MAX_HEAP_SIZE
34+
min: 1000
35+
max: 8000
36+
```
37+
It is important to remember here to leave some headroom for the max config so not to run into OOM or resource issues during the trial. Here
38+
I am running Cassandra in AWS on ec2, t2.xlarge nodes.
39+
40+
Because we never want our HEAP size to be greater than our memory setting, we can configure this in our experiment file by declaring constraints
41+
like so -
42+
43+
```
44+
constraints:
45+
- order:
46+
lowerParameter: MAX_HEAP_SIZE
47+
upperParameter: memory
48+
```
49+
50+
You can also see that we did the same thing here, but defined them in a different way so that MAX_HEAP_SIZE remains 1500M below memory.
51+
52+
You can find documentation on constraints [here](https://docs.stormforge.io/experiment/parameters/#parameter-constraints)
53+
54+
```
55+
constraints:
56+
- name: heap_memory
57+
isUpperBound: true
58+
bound: "-1500"
59+
constraintType: sum
60+
parameters:
61+
- parameterName: memory
62+
weight: "-1.0"
63+
- parameterName: MAX_HEAP_SIZE
64+
weight: "1.0"
65+
```
66+
67+
Next, we need to define our metrics or objectives we are optimizing for -
68+
69+
```
70+
metrics:
71+
- name: duration
72+
minimize: true
73+
query: "{{duration .StartTime .CompletionTime}}"
74+
- name: cost
75+
minimize: true
76+
query: "{{div (add (mul .Values.cpu 22) (mul .Values.memory 3)) 1000}}"
77+
```
78+
79+
In this example, duration is equal to the amount of time it takes for the cassandra-stress job to complete, and the cost is measured by the
80+
amount of CPU and Memory we are consuming in that trial.
81+
82+
Finally, we define our patches and our trial template
83+
84+
```
85+
patch: |
86+
spec:
87+
template:
88+
spec:
89+
containers:
90+
- name: cassandra
91+
resources:
92+
limits:
93+
cpu: "{{ .Values.cpu }}m"
94+
memory: "{{ .Values.memory }}Mi"
95+
requests:
96+
cpu: "{{ .Values.cpu }}m"
97+
memory: "{{ .Values.memory }}Mi"
98+
env:
99+
- name: MAX_HEAP_SIZE
100+
value: "{{ .Values.MAX_HEAP_SIZE }}M"
101+
102+
template: # trial
103+
spec:
104+
initialDelaySeconds: 15
105+
template: # job
106+
spec:
107+
template: # pod
108+
spec:
109+
containers:
110+
- image: thecrudge/cstress:latest
111+
name: cassandra-stress
112+
```
113+
114+
You can see here how we are patching the cassandra containers for limits and env variables for HEAP sizing. You can also see here that we are
115+
using the custom cassandra-stress image we discussed at the beginning of this file. We can validate our trial patch, by descibing a cassandra pod
116+
and verifying the trial settings by describing the trial -
117+
118+
```
119+
kubectl describe pod cassandra-0
120+
Name: cassandra-0
121+
...
122+
Containers:
123+
cassandra:
124+
Container ID: docker://835392cb704e7a01c8011c4d69f7b014159a2b3847809f9074689b905f44596e
125+
Image: gcr.io/google-samples/cassandra:v13
126+
Image ID: docker-pullable://gcr.io/google-samples/cassandra@sha256:7a3d20afa0a46ed073a5c587b4f37e21fa860e83c60b9c42fec1e1e739d64007
127+
Ports: 7000/TCP, 7001/TCP, 7199/TCP, 9042/TCP
128+
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
129+
State: Running
130+
Started: Wed, 02 Jun 2021 10:54:42 -0500
131+
Ready: True
132+
Restart Count: 0
133+
Limits:
134+
cpu: 618m
135+
memory: 5049Mi
136+
Requests:
137+
cpu: 618m
138+
memory: 5049Mi
139+
Readiness: exec [/bin/bash -c /ready-probe.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
140+
Environment:
141+
MAX_HEAP_SIZE: 1413M
142+
HEAP_NEW_SIZE: 7514M
143+
CASSANDRA_SEEDS: cassandra-0.cassandra.default.svc.cluster.local
144+
CASSANDRA_CLUSTER_NAME: K8Demo
145+
CASSANDRA_DC: DC1-K8Demo
146+
CASSANDRA_RACK: Rack1-K8Demo
147+
POD_IP: (v1:status.podIP)
148+
...
149+
```
150+
```
151+
kubectl get trials -w
152+
153+
NAME STATUS ASSIGNMENTS VALUES
154+
cassandra-write-read-mixed-example-000 Completed MAX_HEAP_SIZE=5186, cpu=2309, memory=6622 duration=3411, cost=70
155+
cassandra-write-read-mixed-example-001 Running MAX_HEAP_SIZE=1413, cpu=618, memory=5049
156+
```
157+
158+
## Results
159+
The image below shows us that the machine learning has recommended trial number #98. With this trial we can see we have a cost savings of 34.29%
160+
compared to our baseline in Trial #1.
161+
162+
<img src="img/results1.png" width="400">
163+
164+
In this image, we can see all of our trials, with the recommended trial highlighted.
165+
166+
<img src="img/results2.png" width="400">
167+
168+
And finally, we can get the parameter settings or export the config itself
169+
170+
<img src="img/results3.png" width="400">
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
apiVersion: redskyops.dev/v1beta1
2+
kind: Experiment
3+
metadata:
4+
name: cassandra-rwx
5+
spec:
6+
optimization:
7+
- name: "experimentBudget"
8+
value: "120" #number of trials
9+
parameters:
10+
- name: memory
11+
min: 500
12+
max: 12000
13+
- name: cpu
14+
min: 500
15+
max: 3000
16+
- name: MAX_HEAP_SIZE
17+
min: 1000
18+
max: 8000
19+
# - name: HEAP_NEWSIZE
20+
# min: 1000
21+
# max: 8000
22+
constraints:
23+
- name: heap_memory
24+
sum:
25+
bound: "-1500"
26+
isUpperBound: false
27+
parameters:
28+
- name: memory
29+
weight: "-1.0"
30+
- name: MAX_HEAP_SIZE
31+
weight: "1.0"
32+
# - order:
33+
# lowerParameter: MAX_HEAP_SIZE
34+
# upperParameter: memory
35+
metrics:
36+
- name: duration
37+
minimize: true
38+
query: "{{duration .StartTime .CompletionTime}}"
39+
- name: cost
40+
minimize: true
41+
query: "{{div (add (mul .Values.cpu 22) (mul .Values.memory 3)) 1000}}"
42+
patches:
43+
- targetRef:
44+
kind: StatefulSet
45+
apiVersion: apps/v1
46+
name: cassandra
47+
patch: |
48+
spec:
49+
template:
50+
spec:
51+
containers:
52+
- name: cassandra
53+
resources:
54+
limits:
55+
cpu: "{{ .Values.cpu }}m"
56+
memory: "{{ .Values.memory }}Mi"
57+
requests:
58+
cpu: "{{ .Values.cpu }}m"
59+
memory: "{{ .Values.memory }}Mi"
60+
env:
61+
- name: MAX_HEAP_SIZE
62+
value: "{{ .Values.MAX_HEAP_SIZE }}M"
63+
# - name: HEAP_NEW_SIZE
64+
# value: "{{ .Values.HEAP_NEWSIZE }}M"
65+
template: # trial
66+
spec:
67+
initialDelaySeconds: 15
68+
template: # job
69+
spec:
70+
template: # pod
71+
spec:
72+
containers:
73+
- image: thecrudge/cstress:latest
74+
name: cassandra-stress

0 commit comments

Comments
 (0)