-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schedule: impl balance range scheduler #9005
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
Signed-off-by: 童剑 <[email protected]>
b5cb26c
to
50d905f
Compare
0125dd5
to
120a2bf
Compare
/retest |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9005 +/- ##
==========================================
- Coverage 76.28% 76.25% -0.03%
==========================================
Files 468 468
Lines 71422 71844 +422
==========================================
+ Hits 54484 54785 +301
- Misses 13538 13630 +92
- Partials 3400 3429 +29
Flags with carried forward coverage won't be shown. Click here to find out more. |
/test pull-integration-realcluster-test |
@bufferflies: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: 童剑 <[email protected]>
120a2bf
to
2854acd
Compare
|
||
func (s *balanceRangeScheduler) prepare(cluster sche.SchedulerCluster, opInfluence operator.OpInfluence, job *balanceRangeSchedulerJob) (*balanceRangeSchedulerPlan, error) { | ||
krs := core.NewKeyRanges(job.Ranges) | ||
scanRegions, err := cluster.BatchScanRegions(krs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to separate krs into multiple batches if it is a big range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to separate the ranges by many times because the duration of the lock is the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we acquire the lock multiple times and for every time we hold the lock for a short time, the other places can also get the lock. But if we hold the lock for a long time, it may cause starvation in other places.
} | ||
|
||
opInfluence := s.OpController.GetOpInfluence(cluster.GetBasicCluster(), operator.WithRangeOption(job.Ranges)) | ||
plan, err := s.prepare(cluster, opInfluence, job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For every time we schedule, it will scan the range from the beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the info on the regions and stores in the given ranges may be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it affect the other requests like get region or heartbeat if the range is large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the operator of scanning regions acquires Rlock which shares with the region heartbeat. We could reuse the previous distribution by interval duration, not every scheduler.
conf.RLock() | ||
defer conf.RLock() | ||
for _, job := range conf.jobs { | ||
if job.Status == finished { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we change the status to finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all ranges are balanced, do we still need to wait fa or timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes in this pr. I will add other conditions to check whether the distribution is balanced in next PR.
func (s *balanceRangeScheduler) transferPeer(plan *balanceRangeSchedulerPlan, dstStores []*core.StoreInfo) *operator.Operator { | ||
excludeTargets := plan.region.GetStoreIDs() | ||
if plan.job.Role == leader { | ||
excludeTargets = make(map[uint64]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to exclude the current leader store id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
1628d72
to
40f0f56
Compare
Signed-off-by: 童剑 <[email protected]>
40f0f56
to
246aaa6
Compare
@bufferflies: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: Close #9006
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
Release note