Skip to content

Conversation

@twz123
Copy link

@twz123 twz123 commented Oct 21, 2025

Three goroutines could outlive a call to ClientConn.close(). Add mechanics to cancel them and wait for them to complete when closing a client connection.

Fixes #8655.

RELEASE NOTES:

  • Closing a client connection will cancel all pending goroutines and block until they complete.

@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

❌ Patch coverage is 91.04478% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.22%. Comparing base (50c6321) to head (3fa606c).

Files with missing lines Patch % Lines
clientconn.go 84.21% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8666      +/-   ##
==========================================
+ Coverage   83.21%   83.22%   +0.01%     
==========================================
  Files         419      419              
  Lines       32427    32468      +41     
==========================================
+ Hits        26985    27023      +38     
- Misses       4054     4058       +4     
+ Partials     1388     1387       -1     
Files with missing lines Coverage Δ
balancer_wrapper.go 85.34% <100.00%> (+1.77%) ⬆️
internal/balancer/gracefulswitch/gracefulswitch.go 87.86% <100.00%> (+1.39%) ⬆️
internal/testutils/pipe_listener.go 85.71% <100.00%> (+1.93%) ⬆️
clientconn.go 90.06% <84.21%> (-0.08%) ⬇️

... and 17 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

healthData *healthData

shutdownMu sync.Mutex
shutdown chan struct{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this to shutdownCh to indicate it is a channel?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


func (acbw *acBalancerWrapper) UpdateAddresses(addrs []resolver.Address) {
acbw.ac.updateAddrs(addrs)
acbw.goFunc(func(shutdown <-chan struct{}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we changed this to run the acbw.ac.updateAddrs(addrs) function in a go routine??

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically a "bubbled-up" goroutine. Previously, the goroutine was spawned in updateAddrs itself (line 1021). But, as we now need to track those, I figured it would be most appropriate to do it here. Another option would be to somehow push this down into updateAddrs itself, by passing the acBalancerWrapper pointer, or a function pointer to acbw.goFunc or sth. along those lines, and then use that to spawn the goroutine there:

Suggested change
acbw.goFunc(func(shutdown <-chan struct{}) {
acbw.ac.updateAddrs(acbw, addrs)

Then we could write line 1021 of updateAddrs like so:

	acbw.goFunc(ac.resetTransportAndUnlock)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it this way. Seems to be clearer.

ac.mu.Lock()
defer ac.mu.Unlock()
acMu := &ac.mu
acMu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the original code did not work? this seems unnecessarily complicated and does the same thing. Or am I missing something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a way to make the defer do the unlock conditionally, as the code might need to unlock it before returning (see lines 1441 and 1442). We can achieve the same e.g. with a boolean variable, if you prefer.

I think using a pointer for this is good, because, if you forget the nil check, it will panic with an easy to understand stack trace. Whereas if you forget to check a boolean, you'd do a double-unlock and there's a chance that you get weirder problems.

@twz123 twz123 force-pushed the clientconn-close-waitfor-goroutines branch from e45e9d7 to e8bb2b4 Compare November 5, 2025 12:28
Comment on lines 81 to 83
gsb.pendingSwaps.Add(1)
go func() {
defer gsb.pendingSwaps.Done()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using the newly added Go method on the sync.WaitGroup to run goroutines on it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require a Go bump to 1.25. The go.mod file still states go 1.24.0.

// balancerCurrent.
currentMu sync.Mutex

pendingSwaps sync.WaitGroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this something else? This is actually waiting for the swap, but for the swapped-out balancer to be closed. I can't think of something nice. If you can't come up with something nicer either, then a comment would be useful. Thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with a generic "activeGoroutines" and added a comment.

cur := gsb.balancerCurrent
gsb.balancerCurrent = gsb.balancerPending
gsb.balancerPending = nil
gsb.pendingSwaps.Add(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are here, would you mind adding a comment to balancerWrapper.UpdateState (which is the only place from which swap is called), where it is currently checking if the update if from the current or pending balancer, and returning early if that is not the case.

I'm talking about this check:

if !bw.gsb.balancerCurrentOrPending(bw) {

This check ensures that when there is a race between UpdateState and Close, and if Close wins that race and runs first, UpdateState does not actually swap balancers. This guarantees that swap does not spawn a goroutine after the gracefulswitch balancer is closed. So, if we add a comment near that check explaining this, that would be great.

Thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to add something helpful there.

Three goroutines could outlive a call to ClientConn.close(). Add
mechanics to cancel them and wait for them to complete when closing a
client connection.

RELEASE NOTES:
- Closing a client connection will cancel all pending goroutines and
  block until they complete.

Signed-off-by: Tom Wieczorek <[email protected]>
@twz123 twz123 force-pushed the clientconn-close-waitfor-goroutines branch from e8bb2b4 to 3fa606c Compare November 20, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow waiting for all goroutines to exit on client connection close

4 participants