Skip to content

fix_: cancel ongoing requests when closing the ClientWithFallback #6536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

friofry
Copy link
Contributor

@friofry friofry commented Apr 16, 2025

  • reproduce the issue when Client is closed, but requests are still going through
  • fix the issue

The code uses three mechanisms to manage client closure:

  • Atomic flag: The closed provides immediate rejection of new requests once the client is marked as closed. It ensures that Close() is called only once
  • Signal channel (done): When Close() is called, it closes the done channel which signals all ongoing operations to stop gracefully. This context cancellation approach ensures operations that are already in progress terminate properly.
  • Wait group (wg): The wait group tracks all active operations, allowing Close() to wait for them to complete before finalizing the shutdown process.

Closes #6525
Closes #6576
Closes #6462

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses the issue of ongoing requests not being cancelled when the Client is closed by adding proper cancellation logic. Key changes include:

  • Stopping all RPC clients by acquiring a mutex lock in rpc/client.go.
  • Introducing a done channel and an atomic closed flag in ClientWithFallback (rpc/chain/client.go) to signal client closure.
  • Adding tests in rpc/chain/client_test.go and circuitbreaker/circuit_breaker_test.go to verify the proper cancellation and panic behavior.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
rpc/client.go Updated Stop() to lock, iterate through, and close all clients.
rpc/chain/client_test.go Added tests to verify behavior when client state is nil and when closing.
rpc/chain/client_health_test.go Removed redundant expectation for context cancellation.
rpc/chain/client.go Integrated done channel and atomic closed flag to handle closure.
circuitbreaker/circuit_breaker_test.go Added test for ensuring a nil pointer panic on misuse of CommandResult.

@status-im-auto
Copy link
Member

status-im-auto commented Apr 16, 2025

Jenkins Builds

Click to see older builds (43)
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 947e8fc #1 2025-04-16 13:10:55 ~3 min android 📦aar
✔️ 947e8fc #1 2025-04-16 13:10:58 ~3 min ios 📦zip
✔️ 947e8fc #1 2025-04-16 13:11:32 ~3 min macos 📦zip
✖️ 947e8fc #1 2025-04-16 13:11:54 ~3 min tests 📄log
✔️ 947e8fc #1 2025-04-16 13:13:48 ~5 min linux 📦zip
✔️ 947e8fc #1 2025-04-16 13:13:54 ~5 min macos 📦zip
✔️ 947e8fc #1 2025-04-16 13:14:43 ~6 min windows 📦zip
✔️ 947e8fc #1 2025-04-16 13:16:18 ~8 min tests-rpc 📄log
✔️ 0fb7d9c #2 2025-04-16 13:50:27 ~2 min ios 📦zip
✔️ 0fb7d9c #2 2025-04-16 13:50:36 ~2 min android 📦aar
✔️ 0fb7d9c #2 2025-04-16 13:51:18 ~3 min macos 📦zip
✖️ 0fb7d9c #2 2025-04-16 13:52:04 ~4 min tests 📄log
✔️ 0fb7d9c #2 2025-04-16 13:52:23 ~4 min windows 📦zip
✔️ 0fb7d9c #2 2025-04-16 13:53:17 ~5 min linux 📦zip
✔️ 0fb7d9c #2 2025-04-16 13:53:40 ~5 min macos 📦zip
✔️ 0fb7d9c #2 2025-04-16 13:56:09 ~8 min tests-rpc 📄log
✔️ 698a9e2 #3 2025-04-16 19:15:22 ~2 min ios 📦zip
✔️ 698a9e2 #3 2025-04-16 19:15:44 ~3 min android 📦aar
✔️ 698a9e2 #3 2025-04-16 19:16:26 ~3 min macos 📦zip
✖️ 698a9e2 #3 2025-04-16 19:16:58 ~4 min tests 📄log
✔️ 698a9e2 #3 2025-04-16 19:17:20 ~4 min windows 📦zip
✔️ 698a9e2 #3 2025-04-16 19:18:24 ~5 min macos 📦zip
✔️ 698a9e2 #3 2025-04-16 19:18:41 ~5 min linux 📦zip
✔️ 698a9e2 #3 2025-04-16 19:20:18 ~7 min tests-rpc 📄log
✔️ ff453ef #4 2025-04-16 19:42:33 ~2 min ios 📦zip
✔️ ff453ef #4 2025-04-16 19:42:53 ~3 min android 📦aar
✔️ ff453ef #4 2025-04-16 19:43:27 ~3 min macos 📦zip
✔️ ff453ef #4 2025-04-16 19:44:45 ~4 min windows 📦zip
✔️ ff453ef #4 2025-04-16 19:45:36 ~5 min macos 📦zip
✔️ ff453ef #4 2025-04-16 19:45:45 ~5 min linux 📦zip
✔️ ff453ef #4 2025-04-16 19:47:36 ~7 min tests-rpc 📄log
✖️ ff453ef #4 2025-04-16 20:14:13 ~34 min tests 📄log
✔️ 20919b4 #5 2025-04-17 08:05:26 ~2 min android 📦aar
✔️ 2a47d91 #6 2025-04-17 08:05:51 ~2 min ios 📦zip
✔️ 2a47d91 #6 2025-04-17 08:06:24 ~3 min macos 📦zip
✔️ 2a47d91 #5 2025-04-17 08:07:46 ~5 min windows 📦zip
✔️ 2a47d91 #6 2025-04-17 08:08:11 ~5 min macos 📦zip
✔️ 2a47d91 #6 2025-04-17 08:08:26 ~2 min android 📦aar
✔️ 2a47d91 #6 2025-04-17 08:08:26 ~5 min linux 📦zip
✔️ 2a47d91 #5 2025-04-17 08:10:34 ~7 min tests-rpc 📄log
✔️ 2a47d91 #6 2025-04-17 08:12:00 ~4 min windows 📦zip
✖️ 2a47d91 #6 2025-04-17 08:18:56 ~8 min tests-rpc 📄log
✔️ 2a47d91 #5 2025-04-17 08:37:36 ~34 min tests 📄log
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 94ed96f #7 2025-04-17 08:33:26 ~2 min ios 📦zip
✔️ 94ed96f #7 2025-04-17 08:33:36 ~2 min android 📦aar
✔️ 94ed96f #7 2025-04-17 08:34:20 ~3 min macos 📦zip
✔️ 94ed96f #7 2025-04-17 08:35:24 ~4 min windows 📦zip
✔️ 94ed96f #7 2025-04-17 08:36:29 ~5 min macos 📦zip
✔️ 94ed96f #7 2025-04-17 08:36:32 ~5 min linux 📦zip
✔️ 94ed96f #7 2025-04-17 08:38:36 ~7 min tests-rpc 📄log
✔️ 94ed96f #6 2025-04-17 09:13:48 ~36 min tests 📄log
✔️ ebab628 #8 2025-04-17 09:52:33 ~2 min ios 📦zip
✔️ ebab628 #8 2025-04-17 09:52:45 ~2 min android 📦aar
✔️ ebab628 #8 2025-04-17 09:53:29 ~3 min macos 📦zip
✔️ ebab628 #8 2025-04-17 09:54:28 ~4 min windows 📦zip
✔️ ebab628 #8 2025-04-17 09:55:35 ~5 min macos 📦zip
✔️ ebab628 #8 2025-04-17 09:55:37 ~5 min linux 📦zip
✔️ ebab628 #8 2025-04-17 09:58:02 ~8 min tests-rpc 📄log
✔️ ebab628 #7 2025-04-17 10:25:05 ~35 min tests 📄log

@friofry friofry force-pushed the ab/issue-6525-nil-functors branch from 947e8fc to 0fb7d9c Compare April 16, 2025 13:47
Copy link
Collaborator

@igor-sirotin igor-sirotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see many tests, noice!

}

// Create a context that will be cancelled when either the parent context is done or the client is closed
ctx, cancel := context.WithCancel(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a done channel, we should save this cancel method to ClientWithFallback. And call it in the Stop method.

Example here: https://www.notion.so/Code-smell-2-Pass-context-to-functions-cd49904001ca4152be006124e64f5c61?pvs=4#1cb8f96fb65c80b28720dc44a213522a

Copy link
Contributor Author

@friofry friofry Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach and prefer to store cancel functions in other types. And I understand that storing a context in a struct breaks the top-down cancel approach.

But here I was not sure how to manage/merge cancel functions for multiple calls. So storing a chan seems like a simpler solution. And it's not an antipattern.

Copy link
Collaborator

@igor-sirotin igor-sirotin Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... technically done chan will not solve it, because a channel has only one receiver 🤷
So only one goroutine will be stopped

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this... perhaps adding a WaitGroup to makeCall would be the right solution here. After cancelling all contexts, we should wait for all calls to finish and only then destroy the ClientWithFallback.

Copy link
Contributor Author

@friofry friofry Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this... perhaps adding a WaitGroup to makeCall would be the right solution here. After cancelling all contexts, we should wait for all calls to finish and only then destroy the ClientWithFallback.

Thank you, fixed!

Well... technically done chan will not solve it, because a channel has only one receiver 🤷 So only one goroutine will be stopped

Here is an example with cancel()+chan approach for multiple parallel requests: https://go.dev/play/p/wZqHB0pasYr

@@ -80,6 +81,9 @@ type ClientWithFallback struct {

tag string // tag for the limiter
groupTag string // tag for the limiter group

done chan struct{} // channel to signal client closure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I get the feeling done and closed are doing here what an actual ctx should be doing. If we don't want to manage 2 separate contexts (one global for the whole client, one which comes with the request), there's stuff like this https://github.com/teivah/onecontext that will encapsulate it.

Copy link
Contributor Author

@friofry friofry Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to use ctx, cancel := onecontext.Merge(ctx1, ctx2) in makeCall without having a parent context that was used in the Start method of rpc/client.go.

OR we can have an extra internal lifetime context to handle spawned gorutines cancellation. But then we need to add this edge case to the codesmells#2

@friofry friofry force-pushed the ab/issue-6525-nil-functors branch from ff453ef to 20919b4 Compare April 17, 2025 08:02
@friofry friofry force-pushed the ab/issue-6525-nil-functors branch from 20919b4 to 2a47d91 Compare April 17, 2025 08:02
Copy link

codecov bot commented Apr 17, 2025

Codecov Report

Attention: Patch coverage is 90.32258% with 3 lines in your changes missing coverage. Please review.

Project coverage is 60.31%. Comparing base (fe28e9c) to head (ebab628).

Files with missing lines Patch % Lines
rpc/chain/client.go 88.88% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6536      +/-   ##
===========================================
- Coverage    60.31%   60.31%   -0.01%     
===========================================
  Files          830      830              
  Lines       103642   103670      +28     
===========================================
+ Hits         62514    62524      +10     
+ Misses       33577    33572       -5     
- Partials      7551     7574      +23     
Flag Coverage Δ
functional 25.46% <48.38%> (+0.06%) ⬆️
unit 58.18% <90.32%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
rpc/client.go 65.40% <100.00%> (+0.66%) ⬆️
rpc/chain/client.go 52.62% <88.88%> (+5.15%) ⬆️

... and 34 files with indirect coverage changes

@friofry friofry force-pushed the ab/issue-6525-nil-functors branch from 94ed96f to ebab628 Compare April 17, 2025 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants