server/cluster: cancel ctx on Start failure#10310
server/cluster: cancel ctx on Start failure#10310lance6716 wants to merge 1 commit intotikv:masterfrom
Conversation
Signed-off-by: lance6716 <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe changes add error-triggered cleanup to RaftCluster.Start() by deferring context cancellation if an error occurs after InitCluster succeeds, preventing goroutine leaks on startup failure. A comprehensive test validates this cleanup behavior under bootstrap failure scenarios. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/check-issue-triage-complete |
There was a problem hiding this comment.
Pull request overview
Fixes a RaftCluster startup goroutine leak by ensuring the cluster context is canceled when RaftCluster.Start() fails before running is set to true (so Stop() would otherwise no-op).
Changes:
- Add a deferred cleanup in
RaftCluster.Start()to cancelc.ctxon startup error. - Add a regression test that forces
keyspaceGroupManager.Bootstrap()to fail and assertsc.ctxis canceled (preventing background goroutine leaks).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
server/cluster/cluster.go |
Adds deferred context cancellation on Start() error to prevent leaked background goroutines during partial startup. |
server/cluster/cluster_test.go |
Adds a regression test that simulates a startup failure after goroutine-starting components are created. |
| // If a later step fails before `running` is set to true, `Stop` will return | ||
| // early and cannot reliably clean them up. | ||
| defer func() { | ||
| if err != nil { |
There was a problem hiding this comment.
The deferred cleanup only cancels c.ctx when err != nil, but Start can return early with err == nil while c.running is still false (e.g. when LoadClusterInfo() returns cluster == nil / not bootstrapped). Because InitCluster() already creates components that spawn goroutines off c.ctx (via newSchedulingController -> statistics.NewHotStat -> NewHotCache), this can still leave background goroutines running with no way for Stop() to cancel them (since Stop() returns early when running is false). Consider cancelling on any early exit where running was not set to true (e.g. defer cleanup conditioned on !c.running, or explicitly cancel before returning in the cluster == nil path).
| if err != nil { | |
| // Cancel the context on any failed start, or if we return without ever | |
| // marking the cluster as running, to avoid leaking background goroutines. | |
| if err != nil || !c.running { |
|
/retest |
|
@lance6716: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10310 +/- ##
==========================================
+ Coverage 78.78% 78.85% +0.06%
==========================================
Files 527 527
Lines 70916 70925 +9
==========================================
+ Hits 55870 55926 +56
+ Misses 11026 11001 -25
+ Partials 4020 3998 -22
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
What problem does this PR solve?
Issue Number: Close #10309
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
Release note
Summary by CodeRabbit
Release Notes