ES-10037 Configurable metrics in data stream auto-sharding #125612

PeteGillinElastic · 2025-03-25T19:16:25Z

This adds cluster settings to allow for a choice of write load metrics in the data stream auto-sharding calculations. There are separate settings for the increasing and decreasing calculations. Both default to the existing 'all-time' metric for now.

The main two things done in this commit are: - Split large test methods which do several independent tests in blank code blocks into more smaller methods. - Fix an unnecessarily complicated pattern where the code would create a `Function` in a local variable and then immediately `apply` it exactly once... rather than just executing the code normally.

This adds cluster settings to allow for a choice of write load metrics in the data stream auto-sharding calculations. There are separate settings for the increasing and decreasing calculations. Both default to the existing 'all-time' metric for now.

elasticsearchmachine · 2025-03-27T21:14:00Z

Pinging @elastic/es-data-management (Team:Data Management)

gmarouli · 2025-03-28T11:11:37Z

...r/src/main/java/org/elasticsearch/action/admin/indices/rollover/TransportRolloverAction.java

+                    rolloverAutoSharding = dataStreamAutoShardingService.calculate(
+                        projectState,
+                        dataStream,
+                        indexStats.map(stats -> sumLoadMetrics(stats, IndexingStats.Stats::getWriteLoad)).orElse(null),
+                        indexStats.map(stats -> sumLoadMetrics(stats, IndexingStats.Stats::getRecentWriteLoad)).orElse(null),
+                        indexStats.map(stats -> sumLoadMetrics(stats, IndexingStats.Stats::getPeakWriteLoad)).orElse(null)
+                    );


Thinking out loud: what if we moved the write load calculations in the dataStreamAutoShardingService.calculate(...) and just pass the indexStats?

I think it fits the responsibility of the DataStreamAutoShardingService.java better and it can potentially allow us to do further improvements, if we deem that some write loads are not relevant.

What do you think?

Yeah, that's a good suggestion. I've pushed a commit to do this, see what you think.

I agree that it's better separation of responsibilities. It makes the tests a bit more complicated, because of all the stuff we have to construct to extract and sum those three values from. However it also increases test coverage, I think, because we didn't previously test the extraction and summation logic AFAICS (the tests for the rollover action never asserted that it was making the correct call to the auto-sharding service) and now we do. So the additional complication is in a good cause!

…ct instead of the three load values

gmarouli

LGTM, added some nits but it looks great! Thank you @PeteGillinElastic

...in/java/org/elasticsearch/action/datastreams/autosharding/DataStreamAutoShardingService.java

gmarouli · 2025-03-28T13:42:00Z

...in/java/org/elasticsearch/action/datastreams/autosharding/DataStreamAutoShardingService.java

+        double writeIndexLoad = sumLoadMetrics(writeIndexStats, IndexingStats.Stats::getWriteLoad);
+        double writeIndexRecentLoad = sumLoadMetrics(writeIndexStats, IndexingStats.Stats::getRecentWriteLoad);
+        double writeIndexPeakLoad = sumLoadMetrics(writeIndexStats, IndexingStats.Stats::getPeakWriteLoad);


Nit: This is nice and readable, but if performance becomes an issue we could consider calculating them in one loop. I do not think this is a critical path (executed all the time etc), so this might be ok.

Yeah, I think this runs once every five minutes for each data stream (or when there's a manual API call), so I'm not inclined to complicate the code to save a fraction of a microsecond, unless we discover that it's a problem.

…sharding/DataStreamAutoShardingService.java Co-authored-by: Mary Gouseti <[email protected]>

…nt-write-load-in-autosharding

PeteGillinElastic · 2025-03-28T13:53:44Z

Thanks Mary!

…25612) This adds cluster settings to allow for a choice of write load metrics in the data stream auto-sharding calculations. There are separate settings for the increasing and decreasing calculations. Both default to the existing 'all-time' metric for now. This also refactors `DataStreamAutoShardingServiceTests`. The main two things done are: - Split large test methods which do several independent tests in blank code blocks into more smaller methods. - Fix an unnecessarily complicated pattern where the code would create a `Function` in a local variable and then immediately `apply` it exactly once... rather than just executing the code normally.

elasticsearchmachine added the v9.1.0 label Mar 25, 2025

PeteGillinElastic added 2 commits March 27, 2025 17:43

PeteGillinElastic force-pushed the ES-10037-allow-recent-write-load-in-autosharding branch from 2d605e1 to fe87746 Compare March 27, 2025 17:48

[CI] Auto commit changes from spotless

7dc015d

PeteGillinElastic marked this pull request as ready for review March 27, 2025 20:15

PeteGillinElastic requested a review from a team as a code owner March 27, 2025 20:15

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 27, 2025

PeteGillinElastic added >non-issue :Data Management/Stats Statistics tracking and retrieval APIs and removed needs:triage Requires assignment of a team area label labels Mar 27, 2025

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Mar 27, 2025

PeteGillinElastic added :Data Management/Data streams Data streams and their lifecycles and removed :Data Management/Stats Statistics tracking and retrieval APIs labels Mar 28, 2025

gmarouli self-requested a review March 28, 2025 10:52

gmarouli reviewed Mar 28, 2025

View reviewed changes

Respond to review comment: Make DSASS.calculate() take the stats obje…

422f44f

…ct instead of the three load values

PeteGillinElastic requested a review from gmarouli March 28, 2025 13:30

gmarouli approved these changes Mar 28, 2025

View reviewed changes

PeteGillinElastic and others added 2 commits March 28, 2025 13:51

Update server/src/main/java/org/elasticsearch/action/datastreams/auto…

f019344

…sharding/DataStreamAutoShardingService.java Co-authored-by: Mary Gouseti <[email protected]>

Merge remote-tracking branch 'upstream/main' into ES-10037-allow-rece…

b22ec78

…nt-write-load-in-autosharding

PeteGillinElastic merged commit f91f132 into elastic:main Mar 28, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ES-10037 Configurable metrics in data stream auto-sharding #125612

ES-10037 Configurable metrics in data stream auto-sharding #125612

Uh oh!

PeteGillinElastic commented Mar 25, 2025

Uh oh!

elasticsearchmachine commented Mar 27, 2025

Uh oh!

gmarouli Mar 28, 2025

Uh oh!

PeteGillinElastic Mar 28, 2025

Uh oh!

gmarouli left a comment

Uh oh!

Uh oh!

gmarouli Mar 28, 2025

Uh oh!

PeteGillinElastic Mar 28, 2025

Uh oh!

PeteGillinElastic commented Mar 28, 2025

Uh oh!

Uh oh!

Uh oh!

ES-10037 Configurable metrics in data stream auto-sharding #125612

ES-10037 Configurable metrics in data stream auto-sharding #125612

Uh oh!

Conversation

PeteGillinElastic commented Mar 25, 2025

Uh oh!

elasticsearchmachine commented Mar 27, 2025

Uh oh!

gmarouli Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

PeteGillinElastic Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

gmarouli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gmarouli Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

PeteGillinElastic Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

PeteGillinElastic commented Mar 28, 2025

Uh oh!

Uh oh!

Uh oh!