Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds new anvil environmental variables #6922

Merged
merged 1 commit into from
Jan 22, 2025

Conversation

vanroekel
Copy link
Contributor

this adds two environmental variables to the anvil machine file to mitigate frequent node errors of the form

1300: [b659:110218:0:110218] ib_mlx5_log.c:139  Transport retry count exceeded on mlx5_0:1/IB (synd 0x15 vend 0x81 hw_synd 0/0)             1300: [b659:110218:0:110218] ib_mlx5_log.c:139  DCI QP 0x24822 wqe[255]: SEND s-e [rqpn 0x288e7 rlid 344] [va 0x2b3e26bcb780 len 8192 lkey 0xf3906]

These lines are from lcrc support to address these failures.

this adds two environmental variables to the anvil machine file to
mitigate frequent node errors
@vanroekel
Copy link
Contributor Author

@jonbob I made a tiny commit with the default anvil variables as that error keeps coming without them. Let me know what testing you'd like me to do

@rljacob rljacob assigned amametjanov and unassigned jonbob Jan 21, 2025
Copy link
Contributor

@jonbob jonbob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved based on visual inspection and previous tests with these environment variables on anvil

amametjanov added a commit that referenced this pull request Jan 22, 2025
Add new anvil environmental variables

This adds two environmental variables to Anvil machine file to mitigate
frequent "Transport retry count exceeded" node errors.

[BFB]
@amametjanov amametjanov added Anvil BFB PR leaves answers BFB labels Jan 22, 2025
@amametjanov amametjanov merged commit f6f220a into master Jan 22, 2025
8 checks passed
@amametjanov amametjanov deleted the vanroekel/machine-files/anvil branch January 22, 2025 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Anvil BFB PR leaves answers BFB Machine Files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants