Skip to content

Conversation

@timfn-hg
Copy link
Contributor

@timfn-hg timfn-hg commented Oct 28, 2025

Description:
Instead of ungracefully closing block node connections, this PR attempts to allow connections to gracefully close by doing two things:

  1. When a connection is requested to be closed, we will continue to stream the active block and once the block is fully sent, then the connection is allowed to be closed.
  2. When closing a connection, we will attempt to send the EndStream.Code.RESET code to notify the block node that we are closing the connection.

Note: Technically with these changes, multiple connections may be streaming to block nodes at the same time: the "active" connection as observed by the connection manager, and the connection waiting to be closed after the current block is finished being sent. The overlap should be fairly minimal and just for as long as it takes to finish streaming the block.

Related issue(s):

Fixes #21878

Notes for reviewer:

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

@timfn-hg timfn-hg added this to the v0.69 milestone Oct 28, 2025
@timfn-hg timfn-hg self-assigned this Oct 28, 2025
@timfn-hg timfn-hg requested a review from a team as a code owner October 28, 2025 19:47
@lfdt-bot
Copy link

lfdt-bot commented Oct 28, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Signed-off-by: Tim Farber-Newman <[email protected]>
Signed-off-by: Tim Farber-Newman <[email protected]>
@codacy-production
Copy link

codacy-production bot commented Oct 28, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.01% (target: -1.00%) 95.83%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (44f994a) 104320 77945 74.72%
Head commit (556c58e) 104349 (+29) 77977 (+32) 74.73% (+0.01%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#21903) 72 69 95.83%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 93.05556% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...app/blocks/impl/streaming/BlockNodeConnection.java 92.53% 3 Missing and 2 partials ⚠️

Impacted file tree graph

@@             Coverage Diff              @@
##               main   #21903      +/-   ##
============================================
+ Coverage     70.82%   70.84%   +0.01%     
  Complexity    24445    24445              
============================================
  Files          2675     2675              
  Lines        104415   104444      +29     
  Branches      10960    10964       +4     
============================================
+ Hits          73954    73989      +35     
+ Misses        26424    26421       -3     
+ Partials       4037     4034       -3     
Files with missing lines Coverage Δ Complexity Δ
...cks/impl/streaming/BlockNodeConnectionManager.java 91.36% <100.00%> (+0.90%) 70.00 <0.00> (ø)
...app/blocks/impl/streaming/BlockNodeConnection.java 91.50% <92.53%> (+1.17%) 86.00 <3.00> (+2.00)

... and 2 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

joshmarinacci
joshmarinacci previously approved these changes Oct 28, 2025
Copy link
Contributor

@joshmarinacci joshmarinacci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve but I'd also like to have someone who's more familiar with Block Node Communication review it as well.

Signed-off-by: Tim Farber-Newman <[email protected]>
@timfn-hg
Copy link
Contributor Author

timfn-hg commented Nov 4, 2025

Copy link
Contributor

@petreze petreze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more general question. As far as I remember, we never really want to have more than one BN that we are streaming to (if nothing has changed). Would it be possible to remove this overlap of the new active connection and the one that is finishing up streaming the current block?
We could be waiting for that current block to be streamed, close the connection meant to be closed, and choose a new active connection so that we do not stream simultaneously to two different nodes

@timfn-hg
Copy link
Contributor Author

timfn-hg commented Nov 5, 2025

A more general question. As far as I remember, we never really want to have more than one BN that we are streaming to (if nothing has changed). Would it be possible to remove this overlap of the new active connection and the one that is finishing up streaming the current block?
We could be waiting for that current block to be streamed, close the connection meant to be closed, and choose a new active connection so that we do not stream simultaneously to two different nodes

I don't think there is anything wrong with having two connections open at the same time: one active and one pending close. Immediately following the block being sent, the old connection will close itself. The problem with waiting for the other block to finish streaming before opening the new connection is if the old connection takes a long time, it will negatively impact the CN (saturation will increase). In performance testing we've seen periods where it takes upwards of five seconds to send a single request to the block node. If there are multiple requests that we still need to send and they all take that long, we would have produced several more blocks leading us to be even further behind. This also made the connection scheduling more complicated and brittle since you would need to track the pending new connection and deal with potentially rescheduling connection tasks multiple times to finally do the switch.

Signed-off-by: Tim Farber-Newman <[email protected]>
petreze
petreze previously approved these changes Nov 6, 2025
@timfn-hg
Copy link
Contributor Author

@timfn-hg timfn-hg merged commit b06a845 into main Nov 10, 2025
80 of 81 checks passed
@timfn-hg timfn-hg deleted the timfn/21878-switch-connections-at-block-boundary branch November 10, 2025 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch Block Node Connections at block boundary

6 participants