You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: chaos-days/blog/2024-12-12-News-from-Camunda-Exporter-project/index.md
+17-17
Original file line number
Diff line number
Diff line change
@@ -12,36 +12,36 @@ authors: zell
12
12
13
13
# Chaos Day Summary
14
14
15
-
In this Chaos day we want to verify the current state of the exporter project, and run benchmarks with it. Comparing
15
+
In this Chaos day, we want to verify the current state of the exporter project and run benchmarks with it. Comparing
16
16
with a previous version (v8.6.6) should give us a good hint on the current state and potential improvements.
17
17
18
-
**TL;DR;** The performance that the user sees data has been improved due to the architecture change, but there are still some bugs that we have to fix until the release.
18
+
**TL;DR;** The latency of user data availability has improved due to our architecture change, but we still need to fix some bugs before our planned release of the Camunda Exporter. This experiment allows us to detect three new bugs, fixing this should allow us to make the system more stable.
19
19
20
20
<!--truncate-->
21
21
22
22
## Chaos Experiment
23
23
24
24
### Benchmarks
25
25
26
-
We have seen in previous experiments and benchmarks that the realistic benchmarks are not yet totally reliable, as they seem to overload at some-point the system. This can happen if there is a hiccup and jobs take longer to process. Jobs in the queue are getting delayed, and time out, they are sent out to different workers, but we will reach at some point again the jobs, and we will publish also for this job a message. This in general increases the load of the system as we have to timeout jobs, we have to handle additional message publish, etc.
26
+
We have seen in previous experiments and benchmarks that the realistic benchmarks are not yet totally reliable, as they seem to overload at somepoint the system. This can happen if there is a hiccup and jobs take longer to process. Jobs in the queue are getting delayed, and time out, they are sent out to different workers, but we will reach them at some point again the jobs, and we will publish also for this job a message. This in general increases the load of the system as we have to timeout jobs, we have to handle additional message publish, etc.
27
27
28
-
Additionally, message publish can be rejected, when this happens we wait for another timeout adding again load on the system, more and more retries happen etc. this breaks the benchmark performance.
28
+
Additionally, message publish can be rejected, when this happens we wait for another timeout adding again load on the system, more and more retries happen, etc. This breaks the benchmark performance.
29
29
30
-
To avoid this, we reduce the benchmark payload for now, which is in charge of creating multi instances and call activities etc. To be specific, the reduced the items from 50 to 5,
31
-
but scaled the starter to start more instances. With this payload we can scale more fine granular. Each instance can create 5 sub-instances, when creating three process instances we create effectively 15 instances/token.
30
+
To avoid this, we reduce the benchmark payload for now, which is in charge of creating multiple instances and call activities, etc. To be specific, they reduced the items from 50 to 5
31
+
but scaled the starter to start more instances. With this payload, we can scale more fine granular. Each instance can create 5 sub-instances, when creating three process instances we create effectively 15 instances/token.
32
32
33
-
As this the benchmark runs quite stable, it allows us to better compare the latency between based and main.
33
+
As this benchmark runs quite stable, it allows us to better compare the latency between base and main.
34
34
35
35
### Details Experiment
36
36
37
37
We will run two benchmarks one against 8.6.6, call based, and one against the current main branch (commit a1609130).
38
38
39
39
### Expected
40
40
41
-
When running the base and the main and comparing each other we expect that the general throughput, should be similar.
42
-
Furthermore, we expect that the latency until the user sees data (or data is written into ES and searchable) should be lowered on main than base.
41
+
When running the base and the main and comparing each other we expect that the general throughput should be similar.
42
+
Furthermore, we expect that the latency until the user sees data (or data is written into ES and searchable) should be lowered on the main branch rather than on the base.
43
43
44
-
Note: Right now we don't have a good metric to measure that data is available for the user, we plan to implement this in the starter benchmark application at some-point via querying the REST API. For now, we calculate different average latencies together, whereas we take as elasticsearch flush a constant of 2 seconds.
44
+
Note: Right now we don't have a good metric to measure that data is available for the user, we plan to implement this in the starter benchmark application at somepoint via querying the REST API. For now, we calculate different average latencies together, whereas we take as elastic search flush a constant of 2 seconds.
45
45
46
46
We expect a reduction of latency as we reduce one additional hop/usage of ES as intermediate storage, before aggregation.
47
47
@@ -70,7 +70,7 @@ The general throughput performance looks similar. The resource consumption looks
70
70
71
71
#### Latency
72
72
73
-
This experiment targets to show difference in the data availability for the user.
73
+
This experiment aims to show the difference in the data availability for the user.
74
74
75
75
In order to better visualize the dashboard has been adjusted for this experiment.
76
76
@@ -92,22 +92,22 @@ We were able to show that the latency has been reduced under normal load.
92
92
93
93
#### Found Bugs
94
94
95
-
Within the experiment we run into several other issues. Especially after running for a while, when pods got restarted and importer have been enabled, the Camunda Exporter broke.
95
+
Within the experiment, we run into several other issues. Especially after running for a while, when pods got restarted and importer have been enabled, the Camunda Exporter broke.
0 commit comments