Skip to content

Fixing net_mana crash due to Gdma Error #1154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

erfrimod
Copy link
Contributor

@erfrimod erfrimod commented Apr 7, 2025

After net_mana::tx_poll() receives a CQE of type CQE_TX_GDMA_ERR, it will crash when attempting to pop the front of the posted tx queue. This is because GDMA_ERR is sent from the Hardware to indicate that the queue has been disabled.

  • Adding additional logging to help track and triage these issues.
  • Modifying tx_poll to recreate the tx queue instead of crashing

tx_poll is running in the context of the ManaQueue, however creating a new tx queue requires access to three resources that are tracked by the ManaEndpoint. So the Vport, ResourceArena, and QueueResources are all provided to the ManaQueue as Weak Arc Mutexes. The Vport was already provided to the ManaQueue in order to retarget VPs.

@erfrimod erfrimod force-pushed the erfrimod/tx-handle-gdma-error branch from 6a7eae2 to 8416b1a Compare April 7, 2025 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant