Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES-10037 Configurable metrics in data stream auto-sharding #125612

Conversation

PeteGillinElastic
Copy link
Member

This adds cluster settings to allow for a choice of write load metrics in the data stream auto-sharding calculations. There are separate settings for the increasing and decreasing calculations. Both default to the existing 'all-time' metric for now.

The main two things done in this commit are:

 - Split large test methods which do several independent tests in
   blank code blocks into more smaller methods.

 - Fix an unnecessarily complicated pattern where the code would
   create a `Function` in a local variable and then immediately
   `apply` it exactly once... rather than just executing the code
   normally.
This adds cluster settings to allow for a choice of write load metrics
in the data stream auto-sharding calculations. There are separate
settings for the increasing and decreasing calculations. Both default
to the existing 'all-time' metric for now.
@PeteGillinElastic PeteGillinElastic force-pushed the ES-10037-allow-recent-write-load-in-autosharding branch from 2d605e1 to fe87746 Compare March 27, 2025 17:48
@PeteGillinElastic PeteGillinElastic marked this pull request as ready for review March 27, 2025 20:15
@PeteGillinElastic PeteGillinElastic requested a review from a team as a code owner March 27, 2025 20:15
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 27, 2025
@PeteGillinElastic PeteGillinElastic added >non-issue :Data Management/Stats Statistics tracking and retrieval APIs and removed needs:triage Requires assignment of a team area label labels Mar 27, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Mar 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@PeteGillinElastic PeteGillinElastic added :Data Management/Data streams Data streams and their lifecycles and removed :Data Management/Stats Statistics tracking and retrieval APIs labels Mar 28, 2025
@gmarouli gmarouli self-requested a review March 28, 2025 10:52
Comment on lines 285 to 291
rolloverAutoSharding = dataStreamAutoShardingService.calculate(
projectState,
dataStream,
indexStats.map(stats -> sumLoadMetrics(stats, IndexingStats.Stats::getWriteLoad)).orElse(null),
indexStats.map(stats -> sumLoadMetrics(stats, IndexingStats.Stats::getRecentWriteLoad)).orElse(null),
indexStats.map(stats -> sumLoadMetrics(stats, IndexingStats.Stats::getPeakWriteLoad)).orElse(null)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud: what if we moved the write load calculations in the dataStreamAutoShardingService.calculate(...) and just pass the indexStats?

I think it fits the responsibility of the DataStreamAutoShardingService.java better and it can potentially allow us to do further improvements, if we deem that some write loads are not relevant.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good suggestion. I've pushed a commit to do this, see what you think.

I agree that it's better separation of responsibilities. It makes the tests a bit more complicated, because of all the stuff we have to construct to extract and sum those three values from. However it also increases test coverage, I think, because we didn't previously test the extraction and summation logic AFAICS (the tests for the rollover action never asserted that it was making the correct call to the auto-sharding service) and now we do. So the additional complication is in a good cause!

Copy link
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, added some nits but it looks great! Thank you @PeteGillinElastic

Comment on lines +236 to +238
double writeIndexLoad = sumLoadMetrics(writeIndexStats, IndexingStats.Stats::getWriteLoad);
double writeIndexRecentLoad = sumLoadMetrics(writeIndexStats, IndexingStats.Stats::getRecentWriteLoad);
double writeIndexPeakLoad = sumLoadMetrics(writeIndexStats, IndexingStats.Stats::getPeakWriteLoad);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is nice and readable, but if performance becomes an issue we could consider calculating them in one loop. I do not think this is a critical path (executed all the time etc), so this might be ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this runs once every five minutes for each data stream (or when there's a manual API call), so I'm not inclined to complicate the code to save a fraction of a microsecond, unless we discover that it's a problem.

PeteGillinElastic and others added 2 commits March 28, 2025 13:51
…sharding/DataStreamAutoShardingService.java

Co-authored-by: Mary Gouseti <[email protected]>
@PeteGillinElastic
Copy link
Member Author

Thanks Mary!

@PeteGillinElastic PeteGillinElastic merged commit f91f132 into elastic:main Mar 28, 2025
17 checks passed
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
…25612)

This adds cluster settings to allow for a choice of write load metrics
in the data stream auto-sharding calculations. There are separate
settings for the increasing and decreasing calculations. Both default
to the existing 'all-time' metric for now.

This also refactors `DataStreamAutoShardingServiceTests`. The main two things done are:

 - Split large test methods which do several independent tests in
   blank code blocks into more smaller methods.

 - Fix an unnecessarily complicated pattern where the code would
   create a `Function` in a local variable and then immediately
   `apply` it exactly once... rather than just executing the code
   normally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >non-issue Team:Data Management Meta label for data/management team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants