Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Color code viable/strict blocking jobs differently on HUD #6239

Merged
merged 5 commits into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 31 additions & 1 deletion torchci/components/GroupJobConclusion.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,27 @@ export enum GroupedJobStatus {
Pending = "pending",
}

function isJobViableStrictBlocking(jobName: string | undefined): boolean {
if (!jobName) {
return false;
}

// Source of truth for these jobs is in https://github.com/pytorch/pytorch/blob/main/.github/workflows/update-viablestrict.yml#L26
const viablestrict_blocking_jobs_patterns = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic has a small issue is that it wrongly marks memory leak check jobs as blocking v/s. Also this would only work for PyTorch, other repos have different logic there, i.e. https://github.com/pytorch/executorch/blob/main/.github/workflows/update-viablestrict.yml#L23. It would be nice to have this for all repos, but it might be tricky to implement when this list is hardcoded here, so maybe just limit this feature to PyTorch for a start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some pull jobs in that memory leak check, oddly enough. Any idea how those are triggered?

The closest match for those are these distributed jobs, but if those are indeed the jobs running then the memory leak check condition is buried somewhere deep. @clee2000 might now...

Copy link
Contributor

@huydhn huydhn Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, memory leak check jobs are just regular pull and trunk jobs running once per day with memory leak check turning on. They are triggered by this cron schedule https://github.com/pytorch/pytorch/blob/main/.github/workflows/pull.yml#L14, which is picked up and set by https://github.com/pytorch/pytorch/blob/main/.github/scripts/filter_test_configs.py#L603-L606

  1. mem_leak_check and its cousin rerun_disable_test jobs are supplementary and should not blocking viable/strict update. I vaguely remember that we have logic to exclude them in Rockset day.
  2. I don't see people pay attention to mem_leak_check jobs at all, maybe we should kill it @clee2000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the query on ClickHouse, giving it as an input a commit that failed the mem_leak_check on pull.

Those failed commits do indeed show up in the results.
image

While I agree that this is not behavior we want, I think today memleak checks that affect pull requests do indeed become viable/strict blocking. It's generally not an issue for us though because only one commit a day will have that check run.

But I'll change this feature to only work on pytorch/pytorch for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that this is not behavior we want, I think today memleak checks that affect pull requests do indeed become viable/strict blocking. It's generally not an issue for us though because only one commit a day will have that check run.

But I'll change this feature to only work on pytorch/pytorch for now

Sounds good! I’m pretty sure we ignored mem leak check in the past, so it’s a regression (likely due to CH migration). I could take a look

/trunk/i,
/pull/i,
/linux-binary/i,
/lint/i,
];

for (const regex of viablestrict_blocking_jobs_patterns) {
if (jobName.match(regex)) {
return true;
}
}
return false;
}

export default function HudGroupedCell({
sha,
groupName,
Expand All @@ -60,12 +81,17 @@ export default function HudGroupedCell({
const pendingJobs = [];
const noStatusJobs = [];
const failedPreviousRunJobs = [];

let viableStrictBlocking = false;
for (const job of jobs) {
if (isFailedJob(job)) {
if (isRerunDisabledTestsJob(job) || isUnstableJob(job, unstableIssues)) {
warningOnlyJobs.push(job);
} else {
erroredJobs.push(job);
if (isJobViableStrictBlocking(job.name)) {
viableStrictBlocking = true;
}
}
} else if (job.conclusion === JobStatus.Pending) {
pendingJobs.push(job);
Expand Down Expand Up @@ -113,7 +139,11 @@ export default function HudGroupedCell({
/>
}
>
<span className={styles.conclusion}>
<span
className={`${styles.conclusion} ${
viableStrictBlocking ? styles.viablestrict_blocking : ""
}`}
>
<span
className={
isClassified
Expand Down
5 changes: 5 additions & 0 deletions torchci/components/JobConclusion.module.css
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,8 @@
.warning {
color: #f8b88b;
}

.viablestrict_blocking {
border-left: 1px solid red;
border-right: 1px solid red;
}