-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES-10037 Configurable metrics in data stream auto-sharding #125612
ES-10037 Configurable metrics in data stream auto-sharding #125612
Conversation
The main two things done in this commit are: - Split large test methods which do several independent tests in blank code blocks into more smaller methods. - Fix an unnecessarily complicated pattern where the code would create a `Function` in a local variable and then immediately `apply` it exactly once... rather than just executing the code normally.
This adds cluster settings to allow for a choice of write load metrics in the data stream auto-sharding calculations. There are separate settings for the increasing and decreasing calculations. Both default to the existing 'all-time' metric for now.
2d605e1
to
fe87746
Compare
Pinging @elastic/es-data-management (Team:Data Management) |
rolloverAutoSharding = dataStreamAutoShardingService.calculate( | ||
projectState, | ||
dataStream, | ||
indexStats.map(stats -> sumLoadMetrics(stats, IndexingStats.Stats::getWriteLoad)).orElse(null), | ||
indexStats.map(stats -> sumLoadMetrics(stats, IndexingStats.Stats::getRecentWriteLoad)).orElse(null), | ||
indexStats.map(stats -> sumLoadMetrics(stats, IndexingStats.Stats::getPeakWriteLoad)).orElse(null) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud: what if we moved the write load calculations in the dataStreamAutoShardingService.calculate(...)
and just pass the indexStats
?
I think it fits the responsibility of the DataStreamAutoShardingService.java
better and it can potentially allow us to do further improvements, if we deem that some write loads are not relevant.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good suggestion. I've pushed a commit to do this, see what you think.
I agree that it's better separation of responsibilities. It makes the tests a bit more complicated, because of all the stuff we have to construct to extract and sum those three values from. However it also increases test coverage, I think, because we didn't previously test the extraction and summation logic AFAICS (the tests for the rollover action never asserted that it was making the correct call to the auto-sharding service) and now we do. So the additional complication is in a good cause!
…ct instead of the three load values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, added some nits but it looks great! Thank you @PeteGillinElastic
...in/java/org/elasticsearch/action/datastreams/autosharding/DataStreamAutoShardingService.java
Outdated
Show resolved
Hide resolved
double writeIndexLoad = sumLoadMetrics(writeIndexStats, IndexingStats.Stats::getWriteLoad); | ||
double writeIndexRecentLoad = sumLoadMetrics(writeIndexStats, IndexingStats.Stats::getRecentWriteLoad); | ||
double writeIndexPeakLoad = sumLoadMetrics(writeIndexStats, IndexingStats.Stats::getPeakWriteLoad); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is nice and readable, but if performance becomes an issue we could consider calculating them in one loop. I do not think this is a critical path (executed all the time etc), so this might be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this runs once every five minutes for each data stream (or when there's a manual API call), so I'm not inclined to complicate the code to save a fraction of a microsecond, unless we discover that it's a problem.
…sharding/DataStreamAutoShardingService.java Co-authored-by: Mary Gouseti <[email protected]>
…nt-write-load-in-autosharding
Thanks Mary! |
…25612) This adds cluster settings to allow for a choice of write load metrics in the data stream auto-sharding calculations. There are separate settings for the increasing and decreasing calculations. Both default to the existing 'all-time' metric for now. This also refactors `DataStreamAutoShardingServiceTests`. The main two things done are: - Split large test methods which do several independent tests in blank code blocks into more smaller methods. - Fix an unnecessarily complicated pattern where the code would create a `Function` in a local variable and then immediately `apply` it exactly once... rather than just executing the code normally.
This adds cluster settings to allow for a choice of write load metrics in the data stream auto-sharding calculations. There are separate settings for the increasing and decreasing calculations. Both default to the existing 'all-time' metric for now.