Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A GPU driver update policy is needed #246

Open
mrnicegyu11 opened this issue Jun 26, 2023 · 1 comment
Open

A GPU driver update policy is needed #246

mrnicegyu11 opened this issue Jun 26, 2023 · 1 comment
Assignees
Labels
p:mid-prio t:enhancement New feature or request

Comments

@mrnicegyu11
Copy link
Member

We currently never update our nvidia-drivers, but I guess we should a t some point. The reason for this is that automated updates of GPU drivers have broken running nodes in production in the past (restart required but not done/scheduled explicitly). For this reason, we very strictly pin the version of the nvidia driver.

We should come up with a workflow how to tackle upgrades and check (in staging?) that the newer drivers still work

@mrnicegyu11 mrnicegyu11 added t:enhancement New feature or request p:mid-prio labels Jun 26, 2023
@mrnicegyu11 mrnicegyu11 changed the title Work on a GPU update policy on the different clusters A GPU driver update policy is needed Jun 28, 2023
@mrnicegyu11
Copy link
Member Author

Can we even update the drivers, if so when and how and to what version? We need to talk to MAG for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p:mid-prio t:enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant