Skip to content

Commit 634184b

Browse files
authoredDec 13, 2024··
fix: apply grammarly
1 parent 582fdd1 commit 634184b

File tree

1 file changed

+17
-17
lines changed
  • chaos-days/blog/2024-12-12-News-from-Camunda-Exporter-project

1 file changed

+17
-17
lines changed
 

‎chaos-days/blog/2024-12-12-News-from-Camunda-Exporter-project/index.md

+17-17
Original file line numberDiff line numberDiff line change
@@ -12,36 +12,36 @@ authors: zell
1212

1313
# Chaos Day Summary
1414

15-
In this Chaos day we want to verify the current state of the exporter project, and run benchmarks with it. Comparing
15+
In this Chaos day, we want to verify the current state of the exporter project and run benchmarks with it. Comparing
1616
with a previous version (v8.6.6) should give us a good hint on the current state and potential improvements.
1717

18-
**TL;DR;** The performance that the user sees data has been improved due to the architecture change, but there are still some bugs that we have to fix until the release.
18+
**TL;DR;** The latency of user data availability has improved due to our architecture change, but we still need to fix some bugs before our planned release of the Camunda Exporter. This experiment allows us to detect three new bugs, fixing this should allow us to make the system more stable.
1919

2020
<!--truncate-->
2121

2222
## Chaos Experiment
2323

2424
### Benchmarks
2525

26-
We have seen in previous experiments and benchmarks that the realistic benchmarks are not yet totally reliable, as they seem to overload at some-point the system. This can happen if there is a hiccup and jobs take longer to process. Jobs in the queue are getting delayed, and time out, they are sent out to different workers, but we will reach at some point again the jobs, and we will publish also for this job a message. This in general increases the load of the system as we have to timeout jobs, we have to handle additional message publish, etc.
26+
We have seen in previous experiments and benchmarks that the realistic benchmarks are not yet totally reliable, as they seem to overload at some point the system. This can happen if there is a hiccup and jobs take longer to process. Jobs in the queue are getting delayed, and time out, they are sent out to different workers, but we will reach them at some point again the jobs, and we will publish also for this job a message. This in general increases the load of the system as we have to timeout jobs, we have to handle additional message publish, etc.
2727

28-
Additionally, message publish can be rejected, when this happens we wait for another timeout adding again load on the system, more and more retries happen etc. this breaks the benchmark performance.
28+
Additionally, message publish can be rejected, when this happens we wait for another timeout adding again load on the system, more and more retries happen, etc. This breaks the benchmark performance.
2929

30-
To avoid this, we reduce the benchmark payload for now, which is in charge of creating multi instances and call activities etc. To be specific, the reduced the items from 50 to 5,
31-
but scaled the starter to start more instances. With this payload we can scale more fine granular. Each instance can create 5 sub-instances, when creating three process instances we create effectively 15 instances/token.
30+
To avoid this, we reduce the benchmark payload for now, which is in charge of creating multiple instances and call activities, etc. To be specific, they reduced the items from 50 to 5
31+
but scaled the starter to start more instances. With this payload, we can scale more fine granular. Each instance can create 5 sub-instances, when creating three process instances we create effectively 15 instances/token.
3232

33-
As this the benchmark runs quite stable, it allows us to better compare the latency between based and main.
33+
As this benchmark runs quite stable, it allows us to better compare the latency between base and main.
3434

3535
### Details Experiment
3636

3737
We will run two benchmarks one against 8.6.6, call based, and one against the current main branch (commit a1609130).
3838

3939
### Expected
4040

41-
When running the base and the main and comparing each other we expect that the general throughput, should be similar.
42-
Furthermore, we expect that the latency until the user sees data (or data is written into ES and searchable) should be lowered on main than base.
41+
When running the base and the main and comparing each other we expect that the general throughput should be similar.
42+
Furthermore, we expect that the latency until the user sees data (or data is written into ES and searchable) should be lowered on the main branch rather than on the base.
4343

44-
Note: Right now we don't have a good metric to measure that data is available for the user, we plan to implement this in the starter benchmark application at some-point via querying the REST API. For now, we calculate different average latencies together, whereas we take as elasticsearch flush a constant of 2 seconds.
44+
Note: Right now we don't have a good metric to measure that data is available for the user, we plan to implement this in the starter benchmark application at some point via querying the REST API. For now, we calculate different average latencies together, whereas we take as elastic search flush a constant of 2 seconds.
4545

4646
We expect a reduction of latency as we reduce one additional hop/usage of ES as intermediate storage, before aggregation.
4747

@@ -70,7 +70,7 @@ The general throughput performance looks similar. The resource consumption looks
7070

7171
#### Latency
7272

73-
This experiment targets to show difference in the data availability for the user.
73+
This experiment aims to show the difference in the data availability for the user.
7474

7575
In order to better visualize the dashboard has been adjusted for this experiment.
7676

@@ -92,22 +92,22 @@ We were able to show that the latency has been reduced under normal load.
9292

9393
#### Found Bugs
9494

95-
Within the experiment we run into several other issues. Especially after running for a while, when pods got restarted and importer have been enabled, the Camunda Exporter broke.
95+
Within the experiment, we run into several other issues. Especially after running for a while, when pods got restarted and importer have been enabled, the Camunda Exporter broke.
9696

9797

9898
![exporting-fail](exporting-fail.png)
9999

100-
This caused to increase the latency.
100+
This caused to increase in the latency.
101101

102102
![exporting-fail-latency](exporting-fail-latency.png)
103103

104-
The exporter was not able to detect correctly anymore that the importing was done, but was still flushing periodically (which is as well wrong)
104+
The exporter was not able to detect correctly anymore that the importing was done but was still flushing periodically (which is as well wrong)
105105

106-
See related github issue(s)
106+
See related GitHub issue(s)
107107

108108
* [Importer(s) are not communicating import done correctly](https://github.com/camunda/camunda/issues/26046)
109109
* [Exporter flushes periodically even when importer not completed](https://github.com/camunda/camunda/issues/26047)
110110

111-
Furthermore, based on logs we saw that the treePath hasn't be published correctly in the Exporter.
111+
Furthermore, based on logs we saw that the treePath hasn't been published correctly in the Exporter.
112112

113-
* [Camunda Exporter is not able to consume treePath](https://github.com/camunda/camunda/issues/26048)
113+
* [Camunda Exporter is not able to consume treePath](https://github.com/camunda/camunda/issues/26048)

0 commit comments

Comments
 (0)
Please sign in to comment.