schedule: impl balance range scheduler #9005

bufferflies · 2025-01-17T01:47:03Z

What problem does this PR solve?

Issue Number: Close #9006

What is changed and how does it work?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)

Code changes

Side effects

Related changes

Release note

None.

Signed-off-by: 童剑 <[email protected]>

ti-chi-bot · 2025-01-17T01:47:05Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

ti-chi-bot · 2025-01-17T01:47:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from bufferflies, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: 童剑 <[email protected]>

bufferflies · 2025-02-13T14:13:57Z

/retest

codecov · 2025-02-13T14:36:46Z

Codecov Report

Attention: Patch coverage is 73.71795% with 82 lines in your changes missing coverage. Please review.

Project coverage is 76.25%. Comparing base (4eb7235) to head (246aaa6).
Report is 7 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9005      +/-   ##
==========================================
- Coverage   76.28%   76.25%   -0.03%     
==========================================
  Files         468      468              
  Lines       71422    71844     +422     
==========================================
+ Hits        54484    54785     +301     
- Misses      13538    13630      +92     
- Partials     3400     3429      +29

Flag	Coverage Δ
unittests	`76.25% <73.71%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

bufferflies · 2025-02-17T08:12:31Z

/test pull-integration-realcluster-test

ti-chi-bot · 2025-02-17T08:12:33Z

@bufferflies: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test build

/test pull-integration-realcluster-test

The following commands are available to trigger optional jobs:

/debug pull-unit-test

/test pull-integration-copr-test

Use /test all to run the following jobs that were automatically triggered:

tikv/pd/ghpr_build

tikv/pd/pull_integration_realcluster_test

In response to this:

/test pull-integration-realcluster-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: 童剑 <[email protected]>

pkg/schedule/schedulers/balance_range.go

rleungx · 2025-02-17T10:04:21Z

pkg/schedule/schedulers/balance_range.go

+
+func (s *balanceRangeScheduler) prepare(cluster sche.SchedulerCluster, opInfluence operator.OpInfluence, job *balanceRangeSchedulerJob) (*balanceRangeSchedulerPlan, error) {
+	krs := core.NewKeyRanges(job.Ranges)
+	scanRegions, err := cluster.BatchScanRegions(krs)


Do we need to separate krs into multiple batches if it is a big range?

It doesn't need to separate the ranges by many times because the duration of the lock is the same

If we acquire the lock multiple times and for every time we hold the lock for a short time, the other places can also get the lock. But if we hold the lock for a long time, it may cause starvation in other places.

pkg/schedule/schedulers/balance_range.go

rleungx · 2025-02-17T10:10:52Z

pkg/schedule/schedulers/balance_range.go

+	}
+
+	opInfluence := s.OpController.GetOpInfluence(cluster.GetBasicCluster(), operator.WithRangeOption(job.Ranges))
+	plan, err := s.prepare(cluster, opInfluence, job)


For every time we schedule, it will scan the range from the beginning?

yes, the info on the regions and stores in the given ranges may be updated.

Will it affect the other requests like get region or heartbeat if the range is large?

yes, the operator of scanning regions acquires Rlock which shares with the region heartbeat. We could reuse the previous distribution by interval duration, not every scheduler.

rleungx · 2025-02-17T10:12:00Z

pkg/schedule/schedulers/balance_range.go

+	conf.RLock()
+	defer conf.RLock()
+	for _, job := range conf.jobs {
+		if job.Status == finished {


Where do we change the status to finished?

If all ranges are balanced, do we still need to wait fa or timeout?

yes in this pr. I will add other conditions to check whether the distribution is balanced in next PR.

pkg/schedule/schedulers/balance_range.go

pkg/schedule/operator/operator_controller.go

pkg/schedule/schedulers/balance_range.go

rleungx · 2025-02-17T10:32:28Z

pkg/schedule/schedulers/balance_range.go

+func (s *balanceRangeScheduler) transferPeer(plan *balanceRangeSchedulerPlan, dstStores []*core.StoreInfo) *operator.Operator {
+	excludeTargets := plan.region.GetStoreIDs()
+	if plan.job.Role == leader {
+		excludeTargets = make(map[uint64]struct{})


Do we need to exclude the current leader store id?

Signed-off-by: 童剑 <[email protected]>

ti-chi-bot · 2025-02-19T02:43:22Z

@bufferflies: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-integration-realcluster-test	`246aaa6`	link	true	`/test pull-integration-realcluster-test`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

bufferflies added 8 commits January 7, 2025 18:24

add scheduler config

e693b3e

Signed-off-by: 童剑 <[email protected]>

add new scheduler for key range

23ff7d0

Signed-off-by: 童剑 <[email protected]>

pass ut

1e6d628

Signed-off-by: 童剑 <[email protected]>

lint

d1da5b5

Signed-off-by: 童剑 <[email protected]>

pass ut

d0cfc2d

Signed-off-by: 童剑 <[email protected]>

draft

fb723a0

Signed-off-by: 童剑 <[email protected]>

rename balance-key-range to balance-range

d86148f

Signed-off-by: 童剑 <[email protected]>

use hex encode

8bdb7bc

Signed-off-by: 童剑 <[email protected]>

ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed do-not-merge/needs-linked-issue labels Jan 17, 2025

Merge branch 'balance_key_range' into balance_key_range_scheduler

cb2bd58

ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 17, 2025

bufferflies added 10 commits January 21, 2025 15:05

rename

0696ba6

Signed-off-by: 童剑 <[email protected]>

resolve conflict

46bd44d

Signed-off-by: 童剑 <[email protected]>

impl

66f70c2

Signed-off-by: 童剑 <[email protected]>

add table configuration

5d5ee0f

Signed-off-by: 童剑 <[email protected]>

Merge branch 'balance_key_range' into balance_key_range_scheduler

5aaa79f

add test for getPeets

6a440ba

Signed-off-by: 童剑 <[email protected]>

Merge branch 'master' into balance_key_range_scheduler

97c4da2

add test

1e1d934

Signed-off-by: 童剑 <[email protected]>

sync origin

fdd2279

Signed-off-by: 童剑 <[email protected]>

add test

d0f2cae

Signed-off-by: 童剑 <[email protected]>

bufferflies force-pushed the balance_key_range_scheduler branch from b5cb26c to 50d905f Compare February 13, 2025 08:31

bufferflies force-pushed the balance_key_range_scheduler branch 7 times, most recently from 0125dd5 to 120a2bf Compare February 13, 2025 11:46

bufferflies marked this pull request as ready for review February 14, 2025 02:05

ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 14, 2025

add more test

2854acd

Signed-off-by: 童剑 <[email protected]>

bufferflies force-pushed the balance_key_range_scheduler branch from 120a2bf to 2854acd Compare February 17, 2025 08:54

rleungx reviewed Feb 17, 2025

View reviewed changes

bufferflies force-pushed the balance_key_range_scheduler branch from 1628d72 to 40f0f56 Compare February 18, 2025 09:44

address comment

246aaa6

Signed-off-by: 童剑 <[email protected]>

bufferflies force-pushed the balance_key_range_scheduler branch from 40f0f56 to 246aaa6 Compare February 19, 2025 02:20

bufferflies requested a review from nolouch February 19, 2025 02:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schedule: impl balance range scheduler #9005

schedule: impl balance range scheduler #9005

bufferflies commented Jan 17, 2025 •

edited

Loading

ti-chi-bot bot commented Jan 17, 2025

ti-chi-bot bot commented Jan 17, 2025

bufferflies commented Feb 13, 2025

codecov bot commented Feb 13, 2025 •

edited

Loading

bufferflies commented Feb 17, 2025

ti-chi-bot bot commented Feb 17, 2025

rleungx Feb 17, 2025

bufferflies Feb 19, 2025

rleungx Feb 19, 2025 •

edited

Loading

rleungx Feb 17, 2025

bufferflies Feb 18, 2025

rleungx Feb 18, 2025

bufferflies Feb 19, 2025 •

edited

Loading

rleungx Feb 17, 2025

bufferflies Feb 18, 2025

rleungx Feb 18, 2025 •

edited

Loading

bufferflies Feb 19, 2025

rleungx Feb 17, 2025

bufferflies Feb 18, 2025

ti-chi-bot bot commented Feb 19, 2025

schedule: impl balance range scheduler #9005

Are you sure you want to change the base?

schedule: impl balance range scheduler #9005

Conversation

bufferflies commented Jan 17, 2025 • edited Loading

What problem does this PR solve?

What is changed and how does it work?

Check List

Release note

ti-chi-bot bot commented Jan 17, 2025

ti-chi-bot bot commented Jan 17, 2025

bufferflies commented Feb 13, 2025

codecov bot commented Feb 13, 2025 • edited Loading

Codecov Report

bufferflies commented Feb 17, 2025

ti-chi-bot bot commented Feb 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rleungx Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bufferflies Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rleungx Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ti-chi-bot bot commented Feb 19, 2025

bufferflies commented Jan 17, 2025 •

edited

Loading

codecov bot commented Feb 13, 2025 •

edited

Loading

rleungx Feb 19, 2025 •

edited

Loading

bufferflies Feb 19, 2025 •

edited

Loading

rleungx Feb 18, 2025 •

edited

Loading