Skip to content

Handle vfkit connection in goroutine to prevent blocking #4753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vyasgun
Copy link
Contributor

@vyasgun vyasgun commented May 14, 2025

Description

A bug is being consistently seen where crc is unable to connect to the daemon on restart/recreation. This fix will help prevent seeing this issue.

Fixes: #4736

Relates to: #4738

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Proposed changes

Testing

Contribution Checklist

  • I Keep It Small and Simple: The smaller the PR is, the easier it is to review and have it merged
  • I have performed a self-review of my code
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Which platform have you tested the code changes on?
    • Linux
    • Windows
    • MacOS

Summary by Sourcery

Bug Fixes:

  • Handle vfkit connections asynchronously to prevent daemon startup from blocking on macOS.

Copy link

sourcery-ai bot commented May 14, 2025

Reviewer's Guide

Refactors the macOS daemon listener to accept vfkit connections in a background goroutine, eliminating blocking behavior during connection setup and updating error handling to log failures asynchronously.

File-Level Changes

Change Details Files
Accept vfkit connection asynchronously to avoid blocking
  • Moved vn.AcceptVfkit call inside a go routine
  • Replaced wrapped error return with logging.Errorf inside the goroutine
  • Returned initial listener connection and error immediately without waiting
cmd/crc/cmd/daemon_darwin.go

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

openshift-ci bot commented May 14, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign anjannath for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from anjannath and gbraad May 14, 2025 11:29
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @vyasgun - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@vyasgun vyasgun force-pushed the pr/unixgram-bug-fix branch 2 times, most recently from 6bbd112 to 93920c1 Compare May 14, 2025 12:57
@praveenkumar
Copy link
Member

praveenkumar commented May 15, 2025

With this change, now it is stuck during fresh start also. Following is logs from daemon, looks like connection is closed.

 - - [15/May/2025:10:51:11 +0530] "GET /network/services/forwarder/all HTTP/1.1" 200 3
 - - [15/May/2025:10:51:11 +0530] "GET /network/services/forwarder/all HTTP/1.1" 200 3
 - - [15/May/2025:10:51:11 +0530] "POST /network/services/forwarder/expose HTTP/1.1" 200 0
 - - [15/May/2025:10:51:11 +0530] "POST /network/services/forwarder/expose HTTP/1.1" 200 0
 - - [15/May/2025:10:51:11 +0530] "POST /network/services/forwarder/expose HTTP/1.1" 200 0
 - - [15/May/2025:10:51:11 +0530] "POST /network/services/forwarder/expose HTTP/1.1" 200 0
 - - [15/May/2025:10:51:11 +0530] "POST /network/services/forwarder/expose HTTP/1.1" 200 0
DEBU new connection from /Users/prkumar/.crc/vfkit-354d-1254.sock to /Users/prkumar/.crc/crc-unixgram.sock 
ERRO cannot receive packets from /Users/prkumar/.crc/vfkit-354d-1254.sock, disconnecting: cannot read size from socket: read unixgram /Users/prkumar/.crc/crc-unixgram.sock: use of closed network connection 
ERRO failed to accept vfkit connection: cannot read size from socket: read unixgram /Users/prkumar/.crc/crc-unixgram.sock: use of closed network connection 
INFO listening on /Users/prkumar/.crc/crc-unixgram.sock 
0B sent to the VM, 0B received from the VM
2025/05/15 10:51:15 tcpproxy: for incoming conn 127.0.0.1:52729, error dialing "192.168.127.2:22": connect tcp 192.168.127.2:22: no route to host
0B sent to the VM, 0B received from the VM

Edit: I had some local changes also during testing and after removing those it is working as expected. will wait for @cfergeau for approval and then merge it.

if cancel != nil {
logging.Warnf("new connection to %s. Closing old connection", constants.UnixgramSocketPath)
cancel()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the cancel/ctx global variables, I’d change unixgramListener to have an additional return value of type CancelFunc. The cancellation can then be handled in the code calling vsockListener.
Even if we keep the code as it is currently, I don’t think ctx needs to be global, only cancel.

And not related to this change, but I wonder if unixgramListen would be a better name for this method instead of unixgramListener (but just a comment in passing, nothing important)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split the function into two: setupUnixgramListener and handleUnixgramConnection since cancel() needs to be called once the initial connection is established (transport.AcceptVfkit returns). The cancellation is now being handled by the caller function.

For the names, what do you think of the two new functions?

@vyasgun vyasgun force-pushed the pr/unixgram-bug-fix branch from 6f9d62e to e0cd9b6 Compare May 16, 2025 05:39
@praveenkumar
Copy link
Member

CI is failing with following error

GOARCH=arm64 GOOS=darwin go build -tags "containers_image_openpgp" -ldflags="-X github.com/crc-org/crc/v2/pkg/crc/version.crcVersion=2.50.0 -X github.com/crc-org/crc/v2/pkg/crc/version.ocpVersion=4.18.2 -X github.com/crc-org/crc/v2/pkg/crc/version.okdVersion=4.15.0-0.okd-2024-02-23-163410 -X github.com/crc-org/crc/v2/pkg/crc/version.microshiftVersion=4.18.2 -X github.com/crc-org/crc/v2/pkg/crc/version.commitSha=32561c " -o out/macos-arm64/crc  ./cmd/crc
GOARCH=amd64 GOOS=darwin go build -tags "containers_image_openpgp" -ldflags="-X github.com/crc-org/crc/v2/pkg/crc/version.crcVersion=2.50.0 -X github.com/crc-org/crc/v2/pkg/crc/version.ocpVersion=4.18.2 -X github.com/crc-org/crc/v2/pkg/crc/version.okdVersion=4.15.0-0.okd-2024-02-23-163410 -X github.com/crc-org/crc/v2/pkg/crc/version.microshiftVersion=4.18.2 -X github.com/crc-org/crc/v2/pkg/crc/version.commitSha=32561c " -o out/macos-amd64/crc  ./cmd/crc
GOOS=linux GOARCH=amd64 go build -tags "containers_image_openpgp" -ldflags="-X github.com/crc-org/crc/v2/pkg/crc/version.crcVersion=2.50.0 -X github.com/crc-org/crc/v2/pkg/crc/version.ocpVersion=4.18.2 -X github.com/crc-org/crc/v2/pkg/crc/version.okdVersion=4.15.0-0.okd-2024-02-23-163410 -X github.com/crc-org/crc/v2/pkg/crc/version.microshiftVersion=4.18.2 -X github.com/crc-org/crc/v2/pkg/crc/version.commitSha=32561c " -o out/linux-amd64/crc  ./cmd/crc
# github.com/crc-org/crc/v2/cmd/crc/cmd
Error: cmd/crc/cmd/daemon.go:216:17: undefined: setupUnixgramListener
Error: cmd/crc/cmd/daemon.go:221:77: undefined: constants.UnixgramSocketPath
Error: cmd/crc/cmd/daemon.go:226:7: undefined: handleUnixgramConnection

@vyasgun vyasgun force-pushed the pr/unixgram-bug-fix branch from e0cd9b6 to 409e071 Compare May 16, 2025 06:42
@vyasgun
Copy link
Contributor Author

vyasgun commented May 16, 2025

@praveenkumar these are due to missing duplicate definitions for other OSes. I will add them

@praveenkumar
Copy link
Member

@praveenkumar these are due to missing duplicate definitions for other OSes. I will add them

@vyasgun before push make sure you run make check so that you can locally able to find out issue before hitting by CI.

@vyasgun vyasgun force-pushed the pr/unixgram-bug-fix branch 5 times, most recently from 66c0f4b to 4cb88d0 Compare May 16, 2025 07:05
@vyasgun
Copy link
Contributor Author

vyasgun commented May 16, 2025

The failing test Test OKD bundle is on Ubuntu 24.04.2 LTS and unrelated to the vfkit changes.

Copy link

openshift-ci bot commented May 16, 2025

@vyasgun: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-microshift-crc 4cb88d0 link true /test e2e-microshift-crc
ci/prow/integration-crc 4cb88d0 link true /test integration-crc
ci/prow/e2e-crc 4cb88d0 link true /test e2e-crc

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Moved AcceptVfkit handling into a goroutine to prevent blocking the listener and ensure responsiveness.
Added logic to cancel any existing context before accepting a new connection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] crc start fail for darwin on nightly run
3 participants