Skip to content

Conversation

HyunSangHan
Copy link

Move future cancellation outside of synchronized block in BinderClientTransport.notifyTerminated() to prevent deadlock if AsyncSecurityPolicy uses directExecutor() for callbacks.

Fixes #12190

…utures

Move future cancellation outside of synchronized block in
BinderClientTransport.notifyTerminated() to prevent deadlock if
AsyncSecurityPolicy uses directExecutor() for callbacks.

Fixes grpc#12190
Copy link

linux-foundation-easycla bot commented Aug 16, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@jdcormie
Copy link
Member

Help me understand the change here? All those cancel() calls still appear to come from inside the @GuardedBy("this") method notifyTerminated() method ...

…o fix-binder-deadlock-12190

Signed-off-by: Hyunsang Han <[email protected]>
…ted()

Move future cancellation to offloadExecutor to avoid deadlock when
AsyncSecurityPolicy uses directExecutor() for callbacks.

Fixes grpc#12190

Signed-off-by: Hyunsang Han <[email protected]>
@HyunSangHan
Copy link
Author

Help me understand the change here? All those cancel() calls still appear to come from inside the @GuardedBy("this") method notifyTerminated() method ...

OMG! Sorry. I realized that I missed committing the actual fix!
I've just pushed the missing commit with the proper solution.

@jdcormie Could you please check the latest commit?

Extract future cancellation logic into cancelAsync method and only cancel
futures that are not already done for better performance.

Signed-off-by: Hyunsang Han <[email protected]>
@HyunSangHan HyunSangHan requested a review from jdcormie August 27, 2025 23:52
@jdcormie jdcormie added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Aug 28, 2025
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Aug 28, 2025
@HyunSangHan HyunSangHan requested a review from jdcormie August 28, 2025 15:24
Rename cancelAsync to cancelAsyncIfNeeded, move future cancellation
next to readyTimeoutFuture, and remove unnecessary null assignments.

Signed-off-by: Hyunsang Han <[email protected]>
@HyunSangHan
Copy link
Author

@jdcormie
I’ve addressed the review comments. :)

@jdcormie jdcormie added the kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary label Sep 3, 2025
@grpc-kokoro grpc-kokoro removed the kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary label Sep 3, 2025
@jdcormie
Copy link
Member

jdcormie commented Sep 3, 2025

Woke up this morning with a small new concern: Would this PR cause us to declare the Channel terminated before all work we've enqueued on the offload Executor is complete (or cancelled) ? Take a look at how releaseExecutors() is called right after notifyTerminated() (the site of your changes) returns. Would this need to move into the shutdown path instead?

@HyunSangHan
Copy link
Author

Woke up this morning with a small new concern: Would this PR cause us to declare the Channel terminated before all work we've enqueued on the offload Executor is complete (or cancelled) ? Take a look at how releaseExecutors() is called right after notifyTerminated() (the site of your changes) returns. Would this need to move into the shutdown path instead?

I agree with your concern. That's a very good point.
Since cancellation is enqueued on a separate thread, there's no guarantee that previously enqueued tasks have completed by the time we declare the Channel as terminated.
To address this fundamentally, notifyTerminated should ideally only be called once those tasks have finished.

That said, while thinking about ways to improve the code, two questions came up 🤔 :

  1. Because cancellation itself is enqueued onto the executor and then handled asynchronously, even if we enqueue it earlier there's still no guarantee that the cancel operation will finish before notifyTerminated is invoked. The probability might be higher, but the guarantee isn't there.
  2. Looking at releaseExecutors more closely, it does attempt to gracefully release resources once queued tasks complete. However, the method itself seems to return immediately without waiting for that completion. If that's correct, then simply moving releaseExecutors before notifyTerminated wouldn't necessarily solve the problem either.

Could you elaborate a bit more on what you meant by "move into the shutdown path instead"? I want to make sure I understand your idea fully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

binder: BinderTransport should avoid canceling AsyncSecurityPolicy futures while holding its lock
3 participants