Skip to content

Add script to validate upstream references in PR branch commits #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: mainline
Choose a base branch
from

Conversation

bmastbergen
Copy link
Collaborator

@bmastbergen bmastbergen commented Jun 11, 2025

This script scans each commit in a PR branch for upstream Linux kernel commit references and validates those references. If a commit references an upstream commit, the script checks that the referenced commit exists in mainline and reports if it has been marked with a Fixes: tag in the upstream kernel.

Usage:
python3 check_kernel_commits.py <repo_path> <pr_branch> <base_branch> [--markdown]

By default, the script outputs results for terminal display. Use the --markdown flag to format output for GitHub PR comments.

This kernel-src-tree PR ctrliq/kernel-src-tree#323 includes a github action that uses this script to comment on the PR. There is a lot of messy history in the PR due to testing, but the interesting thing to look at is the github action ctrliq/kernel-src-tree@b82196f and the comment that the github actions bot made flagging the bogus upstream commit reference and available upstream fixes

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new Python script to validate upstream Linux kernel commit references in a PR branch by checking each referenced SHA exists in the mainline branch and detecting any Fixes: tags.

  • Introduces check_kernel_commits.py with functions to list PR commits, extract upstream commit references, verify existence, and find fixes.
  • Provides both plain and GitHub-comment-friendly (“pretty”) output modes.
  • Handles missing references and summarizes findings or success at the end.
Comments suppressed due to low confidence (3)

check_kernel_commits.py:9

  • Core CLI logic and helper functions currently lack any accompanying tests. Consider adding unit tests for run_git, hash_exists_in_mainline, find_fixes_in_mainline, and the main argument-parsing flow to prevent regressions.
def run_git(repo, args):

check_kernel_commits.py:127

  • [nitpick] Using hard-coded space counts for subsequent_indent can be brittle. Consider deriving the indent from the length of the status tag ('[NOTFOUND] ') or defining a constant to improve readability and reduce magic values.
wrap_paragraph(header, width=80, initial_indent='', subsequent_indent='           ')

check_kernel_commits.py:43

  • The git cat-file -e {upstream_ref}^{hash_} syntax is invalid for checking if a commit exists on a branch. Consider using git merge-base --is-ancestor {hash_} {upstream_ref} or git rev-list {upstream_ref} | grep -q {hash_} to verify ancestry.
['git', '-C', repo, 'cat-file', '-e', f'{upstream_ref}^{hash_}'],

@bmastbergen bmastbergen force-pushed the check_kernel_commits branch from 19700ef to 2198c68 Compare June 11, 2025 19:49
Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is Great maybe just some cleanups

Comment on lines +78 to +99
for refname, refval in [('upstream reference', upstream_ref),
('PR branch', args.pr_branch),
('base branch', args.base_branch)]:
if not ref_exists(args.repo, refval):
missing_refs.append((refname, refval))
if missing_refs:
for refname, refval in missing_refs:
print(f"ERROR: The {refname} '{refval}' does not exist in the given repo.")
print("Please fetch or create the required references before running this script.")
sys.exit(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in favor of this.
parser.add_argument includes a required=True functional argument to fail out early. This seems like an easy way to mess something up in the future.

See Reference: https://github.com/ctrliq/kernel-src-tree-tools/blob/mainline/rolling-release-update.py#L85

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'repo', 'pr_branch', and 'base_branch' arguments are defined as positional arguments (they aren't optional with - or --). So if you don't have three arguments you'll get an error right away. Once we have those arguments this code is checking to make sure that those references actually exist in the git repo we are operating on (along with the implied reference origin/kernel-mainline).

parser.add_argument("repo", help="Path to the git repo")
parser.add_argument("pr_branch", help="Name of the PR branch")
parser.add_argument("base_branch", help="Name of the base branch")
parser.add_argument("--pretty", action='store_true', help="Output in Markdown, suitable for GitHub PR comments")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is pretty the correct term? I guess when I see --pretty I expect a TUI pretty ... but since its MD for the GITHUB actions maybe this is ok.

I think maybe --markdown or --automation or something

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like --markdown

upstream_hashes = re.findall(r'^commit\s+([0-9a-fA-F]{12,40})', msg, re.MULTILINE)
for uhash in upstream_hashes:
short_uhash = uhash[:12]
exists = hash_exists_in_mainline(args.repo, upstream_ref, uhash)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just a simple comment that for this we're checking that the PR commit exists upstream.
It took me a quick minute for me to realize what was happening.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add

"""Return commits in upstream_ref that have Fixes: <first 12 chars of hash_> in their message."""
short_hash = hash_[:12]
output = run_git(repo, [
'log', upstream_ref, '--grep', f'Fixes: {short_hash}', '--format=%h %s (%an)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do multiple greps lets do both upper and lower case because not everyone is consistent

'--grep', f'fixes: {short_hash}'

Are the upstream fixes consistently 12 or are do we sometimes see 8's.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for upper and lower sounds good. Not sure about 8s. I'll look.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whipped up this quick oneliner. Didn't validate it a whole lot. But it does look like there are ~1600 fixes references with a hash <12.

brett@iconium ~/linux % git log | grep -E -i "   Fixes: [a-fA-F0-9]{1,40} \(" | awk '{print $2}' | xargs -I@ sh -c 'echo -n @ | wc -m' | sort -n | uniq -c
     10 6
    282 7
    236 8
    220 9
    464 10
    419 11
 105987 12
   5335 13
   1206 14
    273 15
   2202 16
     61 17
     46 18
     35 19
     30 20
     14 21
      7 22
      7 23
      7 24
      6 25
      1 26
      2 27
      1 29
      1 31
      4 32
      1 38
    102 40


def hash_exists_in_mainline(repo, upstream_ref, hash_):
"""Check if a commit hash exists in the upstream reference."""
result = subprocess.run(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not using run_git

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change

@bmastbergen bmastbergen force-pushed the check_kernel_commits branch 2 times, most recently from c74c6b5 to c87fea4 Compare June 12, 2025 20:08
This script scans each commit in a PR branch for upstream Linux kernel
commit references and validates those references. If a commit references
an upstream commit, the script checks that the referenced commit exists
in mainline and reports if it has been marked with a Fixes: tag in the
upstream kernel.

Usage:
python3 check_kernel_commits.py <repo_path> <pr_branch> <base_branch> [--markdown]

By default, the script outputs results for terminal display. Use the
--markdown flag to format output for GitHub PR comments.
@bmastbergen bmastbergen force-pushed the check_kernel_commits branch from c87fea4 to 1144d0b Compare June 12, 2025 20:18
@bmastbergen
Copy link
Collaborator Author

So this might be a problem with vibe coding, or maybe more specifically with me vibe coding. The git cat-file method for verifying upstream commits exist was not actually working in what I checked in. Not sure how I got to that point because the script was definitely able to find bogus commits at one point. In the current version that function is now using git merge-base --is-ancestor which seems to work and is perhaps easier to reason about by the casual observer anyway. 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants