Skip to content

Add linting and formatting to linkml-runtime #347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

sneakers-the-rat
Copy link
Contributor

@sneakers-the-rat sneakers-the-rat commented Oct 28, 2024

upstream_repo: dalito/linkml
upstream_branch: issue2578-fix-uri-in-snapshot

edit: not anymore Builds on: #345

The non-modular PR is because the changes to update to 3.9 use the linting rules we're implementing here.

This sets us up with

  • The existing rules in linkml
  • UP - pyupgrade with an override for X | Y types until 3.9 gets dropped
  • T100 - no pdb imports or uses (i am guilty of this a lot)

I punted on a handful of rules for the tests module because there are a lot of violations and it's just in tests, those are mostly assigned values that aren't used, which are fine in tests if a little implicit.

Bigass diff, i'm aware, but that's why we add the linter rules, so future diffs are smaller :)

Review spots

To help out review, all the changes here are linter fixes with no functional change except:

  • linkml_runtime.utils.metamodelcore - There were a handful of invalid error handling blocks that looked like this:

    try:
        # something
    except TypeError:
        pass
    except ValueError:
        pass
    if not is_strict():
        return str(value)
    raise e

    And that isn't valid because e is undefined. It's also ambiguous whether or not those values should always pass, and we raise on any other error, or whether we raise those errors if in strict mode. I assumed the latter, so i changed that to

    try:
        # something
    except (TypeError, ValueError):
        if is_strict():
            raise
    
    return str(value)
  • linkml_runtime.loaders.rdf_loader - there's a reference to pyld_jsonld_from_rdflib_graph and i'm not sure what that is, can't find it anywhere. so i marked it noqa

  • linkml_runtime.utils.permissiblevalueimpl:PvFormulaOptions has a default value that uses EnumDefinition. That would make a pretty nasty dependency loop if we fixed it, so i just commented it out. This prevented the module from being imported at all, so i suspect this is deprecated.

  • tests.__init__.py used an eval call to get a log level, using eval that should basically never be done, so i switched it to getattr

@sneakers-the-rat
Copy link
Contributor Author

looks like we need to add a parameter to the upstream tests that lets us specify which branch to test against.

@ialarmedalien
Copy link
Contributor

Is this PR still a going concern or has it been abandoned?

@sneakers-the-rat
Copy link
Contributor Author

I just forgot about it, but it would be relatively straightforward to rebase it - it was always intended to be considered after #345 and that's in now so it could be done.

@ialarmedalien
Copy link
Contributor

ialarmedalien commented May 4, 2025

I was going to put in a PR with some ruff formatting/linting, but since this one is already done, it'd be much better to use this instead. Could you update it so it can be re-reviewed if necessary and merged? (I am happy to review.)

I think it'd make more sense to use ruff for code formatting -- the output is 99% identical to that of black and it'd be one less dependency.

@Silvanoc
Copy link
Contributor

Silvanoc commented May 7, 2025

@sneakers-the-rat may I propose some changes to make this PR digestible for a reviewer? I would split it into smaller PRs, building each of the following points a PR build upon the previous ones:

1. Add linter configurations

PR including:

This should be a relatively easy review for someone with some ruff knowledge. Once such a PR is merged, everybody is capable of running it locally in an opt-in manner.

2. Lint and format

PR containing:

  • The result of running make lint-fix in a single commit, which document that command in the message. This way any reviewer and thanks to PR 1 is capable of reproducing it. That is an easy review, if the reviewer trust the tool.
  • Alternatively smaller commits can be created manually running individual ruff commands which should be documented on the commit message. Here again, reproducing for a reviewer is peanuts.
  • Any additionally beautification manually applied over the previous results.

A reviewer can simply check if the ruff configuration is fine, reproduce to ensure that all generated changes are generated. Such a review is also really easy, although it contains a huge amount of changes.

3. Enforce linting and formatting

PR containing:

This is also easy to review and, if both previous PRs run well, should run really smoothly.

From that point on, we start practicing "code lookism" (don't forget never to practice real lookism!).


I'm taking the setup available in the https://github.com/linkml/linkml repo as a reference. If you disagree with it, please consider fixing it too.

I'm happy to help you on all of it, if desired!

@ialarmedalien
Copy link
Contributor

See also the recently-implemented ruff formatting and linting in linkml-map.

@Silvanoc
Copy link
Contributor

Silvanoc commented May 7, 2025

See also the recently-implemented ruff formatting and linting in linkml-map.

@ialarmedalien IMO the ruff rules that you use is the most interesting part. Apart from that I can see some relevant differences with the use of ruff in the linkml repo:

  1. linkml-map is explicitly declaring ruff as a project dev dependency. I would also tend to do so, but linkml relies on ruff to install it in the ephemeral virtual environments that it creates, therefore even the version is specified.
  2. linkml-map does not provide any pre-commit configuration, whereas linkml does. Keeping the ruff versions synchronous between pyproject.toml/tox.ini and .pre-commit-config.yaml is challenging, but I appreciate having a pre-commit configuration.
  3. linkml-map does not provide any Makefile target for the linter.

@ialarmedalien
Copy link
Contributor

@Silvanoc I mostly took the path of least change (least resistance) with the linkml-map set up but would prefer to have a standardised formatting/linting set up in as many of the repos as possible (ideally via template, but it would probably have to be manually configured for now).

My preference would be:

  • have ruff as a dev dependency (which also makes it available for local code editors)
  • add shortcuts (Makefile targets) for checking code formatting, linting, and codespell
  • pre-commit hooks and GitHub Actions for the above if they do not exist
  • remove formatting/linting/etc. from tox

That makes it easy for devs and contributors to "do the right thing".

Initial targets for this set up would be linkml-map, runtime, and the main linkml repo.

Copy link

codecov bot commented May 8, 2025

Codecov Report

Attention: Patch coverage is 60.84724% with 305 lines in your changes missing coverage. Please review.

Project coverage is 63.74%. Comparing base (00abef0) to head (4d89f20).

Files with missing lines Patch % Lines
linkml_runtime/linkml_model/datasets.py 0.00% 84 Missing ⚠️
linkml_runtime/linkml_model/validation.py 0.00% 25 Missing ⚠️
linkml_runtime/utils/schemaview_cli.py 0.00% 25 Missing ⚠️
linkml_runtime/linkml_model/mappings.py 0.00% 16 Missing ⚠️
linkml_runtime/processing/referencevalidator.py 74.46% 10 Missing and 2 partials ⚠️
linkml_runtime/utils/namespaces.py 60.00% 10 Missing and 2 partials ⚠️
linkml_runtime/utils/schemaview.py 84.61% 10 Missing and 2 partials ⚠️
linkml_runtime/utils/permissiblevalueimpl.py 0.00% 11 Missing ⚠️
linkml_runtime/utils/yamlutils.py 75.60% 9 Missing and 1 partial ⚠️
linkml_runtime/loaders/loader_root.py 65.21% 7 Missing and 1 partial ⚠️
... and 20 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #347      +/-   ##
==========================================
- Coverage   63.79%   63.74%   -0.06%     
==========================================
  Files          63       63              
  Lines        8946     8939       -7     
  Branches     2587     2589       +2     
==========================================
- Hits         5707     5698       -9     
  Misses       2633     2633              
- Partials      606      608       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sneakers-the-rat
Copy link
Contributor Author

sneakers-the-rat commented May 8, 2025

rebased.

may I propose some changes to make this PR digestible for a reviewer? I would split it into smaller PRs, building each of the following points a PR build upon the previous ones:

i appreciate the request, but it essentially amounts to completely redoing the PR, so i'm not going to do that. feel free to close this if it's unwanted. i've spent enough time on this doing it the first time and then rebasing it 7 months later.

there is no way to make a "reformat the whole package" PR smaller. i don't see the purpose of splitting out adding the linter rules and applying them. it's pretty easy to ctrl+f for "pyproject.toml" in the diff view.

The result of running make lint-fix in a single commit, which document that command in the message. This way any reviewer and thanks to PR 1 is capable of reproducing it. That is an easy review, if the reviewer trust the tool.
Alternatively smaller commits can be created manually running individual ruff commands which should be documented on the commit message. Here again, reproducing for a reviewer is peanuts.
Any additionally beautification manually applied over the previous results.

this is exactly how the PR is structured already, modulo rebasing.


Side note on pre-commit actions: ime they are exclusionary - pre-commit the package is sort of ridiculously wasteful of space and time, and not everybody has 100GB of free space to install the million docker images that pre-commit actions always seem to want to rely on. the purpose of CI is to lower barriers to contribution while ensuring code correctness. linting can run in CI. I don't think linkml will ever get to the code velocity where having an extra "lint" commit is actually impactful to anyone's work or understanding of the repo.


current test failures are mysterious to me. It looks like the current snapshots are incorrect, but not sure why this fixes them -
schema that's failing is this: https://github.com/linkml/linkml/blob/main/tests/test_utils/input/owl1.yaml
output snapshot that doesn't match: https://github.com/linkml/linkml/blob/main/tests/test_utils/__snapshots__/owl1.owl

notice how the URIs seem to be incorrectly generated with leading : in the generated output - as far as I know, for a schema with id: http://example.org/owl1 and some element slotopt, its uri should be http://example.org/owl1/slotopt rather than http://example.org/owl1/:slotopt

@ialarmedalien
Copy link
Contributor

This PR fixes the upstream test issues but it's awaiting the release of linkml-runtime. Gotta love tightly-coupled packages!


[tool.ruff.lint.per-file-ignores]
"tests/**/*.py" = ["F401"] # unused imports
"tests/**/**.py" = ["F841", "E501", "F842", "E741"] # I ain't fixing all that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody needs nitpicking on tests 😄

@ialarmedalien
Copy link
Contributor

ialarmedalien commented May 8, 2025

Do we need black and ruff as code formatters? Due to the amount of work involved in generating this PR, I would be inclined to merge as-is and make a second PR that removes black and implements any formatting changes that introduces there.

Apart from that, LGTM. Do we need another maintainer to take a look and approve?

Thank you for all your work here!

@Silvanoc
Copy link
Contributor

Silvanoc commented May 8, 2025

i appreciate the request, but it essentially amounts to completely redoing the PR, so i'm not going to do that. feel free to close this if it's unwanted. i've spent enough time on this doing it the first time and then rebasing it 7 months later.

@sneakers-the-rat Sorry if I haven't shown enough sensitivity in my proposal. You invested significant time in this PR and I come proposing that you reformat the whole... As I explain further down, my proposal partially originates from a misunderstanding from my side.

I was just making a proposal to make a review easier to accomplish.

Let's switch roles. Imagine that I'm the author of this PR and you are considering reviewing it. So you would probably consider following:

  1. Manually reviewing all the changes: you would probably discard it, because this PR is changing 156 files, adding 12,615 lines and removing 6,689.
  2. Blindly trusting me and approving it: you would probably discard it too, because you deliver quality (you have shown it often enough) and wouldn't give your stamp of approval so blindly.
  3. Discarding taking over the review: what is currently happening.
  4. Contacting me to ask for a reviewable PR: that should be IMO the only real alternative to the option 3.

there is no way to make a "reformat the whole package" PR smaller. i don't see the purpose of splitting out adding the linter rules and applying them. it's pretty easy to ctrl+f for "pyproject.toml" in the diff view.

Of course option 4 does not only imply proposing a full refactoring of this PR, and that's been my error.

I should have asked first for the ruff rules that you have used for the changes so that they can be reproduced by a reviewer. Then you would have given me this answer and I would have realized how blind I've been 🤦🏻 and I would have agreed with you 😄

The result of running make lint-fix in a single commit, which document that command in the message. This way any reviewer and thanks to PR 1 is capable of reproducing it. That is an easy review, if the reviewer trust the tool.
Alternatively smaller commits can be created manually running individual ruff commands which should be documented on the commit message. Here again, reproducing for a reviewer is peanuts.
Any additionally beautification manually applied over the previous results.

this is exactly how the PR is structured already, modulo rebasing.

That's what I was expecting from you, but to be honest I could not easily identify in your commits which contained only changes made by ruff and which made by you.

Side note on pre-commit actions: ime they are exclusionary - pre-commit the package is sort of ridiculously wasteful of space and time, and not everybody has 100GB of free space to install the million docker images that pre-commit actions always seem to want to rely on. the purpose of CI is to lower barriers to contribution while ensuring code correctness. linting can run in CI. I don't think linkml will ever get to the code velocity where having an extra "lint" commit is actually impactful to anyone's work or understanding of the repo.

pre-commit is optional for developers who want to use it (like me), so I don't see a problem providing it. You don't like it and prefer to let CI tell you that something is wrong? Fine for you. But I personally hate when I push and get to grab a coffee or switch task to check CI after some time and realize that the linter is telling me after 30s that I've spelled a word wrong.

@Silvanoc
Copy link
Contributor

Silvanoc commented May 8, 2025

I would like to shortly illustrate what I had in mind.

This branch of mine has all the configurations to apply the same linting and formatting (it's the pyproject.toml of this branch) that you are applying. It's simple and easy to review. I, as a reviewer, would mostly look at the configured rules.

A developer fetching the above mentioned branch can run the linter and formatter and should get exactly this commit. A reviewer could simply reproduce it and accept the bunch of changes resulting from it.

The diff with the branch of this PR contains, in theory, only the changes that you've manually applied. That part should also be relatively easy to review.

Copy link
Contributor

@Silvanoc Silvanoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending tests fixing

@Silvanoc
Copy link
Contributor

Silvanoc commented May 8, 2025

current test failures are mysterious to me. It looks like the current snapshots are incorrect, but not sure why this fixes them - schema that's failing is this: https://github.com/linkml/linkml/blob/main/tests/test_utils/input/owl1.yaml output snapshot that doesn't match: https://github.com/linkml/linkml/blob/main/tests/test_utils/__snapshots__/owl1.owl

notice how the URIs seem to be incorrectly generated with leading : in the generated output - as far as I know, for a schema with id: http://example.org/owl1 and some element slotopt, its uri should be http://example.org/owl1/slotopt rather than http://example.org/owl1/:slotopt

@sneakers-the-rat the issue with the tests is possibly related to linkml/linkml#2648. @dalito can you confirm it? If so, can it be easily fixed?

@dalito
Copy link
Member

dalito commented May 8, 2025

The tests failures are exactly as expected until linkml/linkml#2648 is merged.

@sneakers-the-rat could edit the first message of this PR to test against linkml/linkml#2648 to get all tests green (as I did here). If required, I can rebase that PR again.

@sneakers-the-rat
Copy link
Contributor Author

"Apply linting rules from upstream"
59a8135

"Reformat with black"
d7d8975

"Apply ruff safe fixes"
74cec4f

"Mid ruff linting" aka manually fixing linter errors
6e3c85d

(A few smaller, labeled manual changes)

The one outlier step is repeating the above process for all the code that has changed since the rebase.

ed75b2d

The first commit has the commands needed to reproduce each of the programmatic stages of the PR - they are in the tox format action. The other changes that are not uncontroversially linter fixes are described in the OP. One can view the diff between any two commits on github using {repo}/compare/hash...hash

so im not disagreeing about how taking those steps would make the PR more reviewable, im just saying I already did them.

@sneakers-the-rat could edit the first message of this PR to test against linkml/linkml#2648 to get all tests green (as I did #388 (comment)). If required, I can rebase that PR again.

Just did that, I love that thing.

Do we need black and ruff as code formatters?

I am mostly trying to get parity between upstream and this package for now, but afterwards if we want to remove black from both, I take no position on that but could be done together. Last I checked there were some minor places they disagreed with one another? But could be wrong, and it also might not matter. Idk if we take it out and ruff formats the same way, then that seems like it would be fine??

@Silvanoc
Copy link
Contributor

Silvanoc commented May 9, 2025

so im not disagreeing about how taking those steps would make the PR more reviewable, im just saying I already did them.

I could not recognize it. So it's probably me, that got overwhelmed by the surface (huge amount of changes) without scratching the surface enough.

Anyway, I really appreciate the huge effort and, as always, very valuable contribution 🚀

The only intention of my proposal was lowering the review effort to ensure that it finally gets reviewed and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants