Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Nova build job timeout parameter configurable #5631

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Sep 9, 2024

TSIA. This is a request from fbgemm in which their Linux CUDA build jobs is too close to the 120 minutes threshold and frequently timeout, i.e. https://github.com/pytorch/FBGEMM/actions/runs/10772363019/job/29869849673

@huydhn huydhn requested a review from atalman September 9, 2024 19:42
Copy link

vercel bot commented Sep 9, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
torchci ⬜️ Ignored (Inspect) Sep 9, 2024 7:42pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2024
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, as for Nova this is a no-op

@malfet malfet merged commit 369e393 into main Sep 9, 2024
44 of 77 checks passed
spcyppt added a commit to spcyppt/FBGEMM that referenced this pull request Sep 10, 2024
Summary:
FBGEMM build for cuda 12.1 on Nova has been facing time-out error and unable to release 12.1 binaries since it takes longer than 120 minutes. For example, https://github.com/pytorch/FBGEMM/actions/runs/10772363019/job/29869849673.

Huy from Dev infra has helped make time-out as a configurable parameter pytorch/test-infra#5631.

This diff extends time out to 180 mins until we find a good solution to reduce build time.

Differential Revision: D62415584
facebook-github-bot pushed a commit to pytorch/FBGEMM that referenced this pull request Sep 10, 2024
Summary:
Pull Request resolved: #3102

FBGEMM build for cuda 12.1 on Nova has been facing time-out error and unable to release 12.1 binaries since it takes longer than 120 minutes. For example, https://github.com/pytorch/FBGEMM/actions/runs/10772363019/job/29869849673.

Huy from Dev infra has helped make time-out as a configurable parameter pytorch/test-infra#5631.

This diff extends time out to 180 mins until we find a good solution to reduce build time.

Reviewed By: sryap

Differential Revision: D62415584

fbshipit-source-id: 3bdbd0beb609797ab87ebaaf2227789545983af3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants