Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format generated Python parser module #347

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kieran-ryan
Copy link
Member

@kieran-ryan kieran-ryan commented Jan 4, 2025

🤔 What's changed?

  • Updated parser generation code with Python formatting as per style guidelines
  • Invoked no-operation Token.detach method in generated parser code
  • Migrated Python formatter from black to ruff
  • Applied refactorings to generated parser code
    • Reorganised imports - I001 (ruff check gherkin/parser.py --select=I001)
    • Dropped or False logic from conditionals - RET505 (ruff check gherkin/parser.py --select=RET505)
    • Replace series of or statements with any call
    • Drop else statement and nesting proceeding by conditionals that would return from the enclosing function

⚡️ What's your motivation?

  • Ensures entire Python implementation is formatted consistently - building on Apply black code formatter to the python codebase #286
    • Improves code review as pull requests less-susceptible to stylistic changes
    • Simplifies configuration - removes exclusion of incompliant modules
    • Improves readability
      • Bulk of change reduces mix of 4 and 8 space indentation within match_token_at_'x' methods to just 4
        • Was difficult to read and was most-likely a workaround in the generator code to satisfy generating code for two cases: to run inside conditionals (with 4 spaces) and outside conditionals (8 spaces) without having to handle within the generator; a SyntaxError would be raised if standardised to 4 spaces without handling (as per this pull request)
  • Migrating formatting to ruff has a number of advantages - including performance, and enabling standardising to a single toolset for linting and formatting
  • General refactorings to improve code quality and simplify code generation
    • or False is redundant - assume added as a safeguard where the proceeding args are not present (defaults to just ‘False’ in that case) pots generation - however the unit tests protect against this, failing if the args aren’t generated
    • Easier to generate any calls against a comma-separate iterable compared to a series of or statements where it can't be applied on every line
    • superflous else is redundant (superflous-else-return - RET505)
    • Generated parser code currently contains a useless-expression (B018) token.detach - updated code invokes appropriately though this doesn't actually alter behaviour (being a no-operation method) though conforms to intention and other parser implementation

🏷️ What kind of change is this?

  • 🏦 Refactoring/debt/DX (improvement to code design, tooling, etc. without changing behaviour)

♻️ Anything particular you want feedback on?

  • Thoughts whether a CHANGELOG entry is warranted?

  • To evaluate changes are formatted correctly:

    Open a codespace (or locally alternatively):

    Open in GitHub Codespaces

    Regenerate all parser implementations:

    docker build --tag berp-env .
    docker run --rm --interactive --tty --volume ".:/app" berp-env
    make clean-generate
    make generate

    From a separate terminal window, install the pre-commit hook and validate all Python code is formatted (including the generated parser code):

    pip install pre-commit
    pre-commit install
    pre-commit run --all-files ruff-format

    Run Python unit and acceptance tests to validate no regression in the generated parser implementation:

    pip install pytest
    pytest
    cd python/
    make acceptance
  • Have skipped formatting on state_comment string

    • Challenging to generate formatted Python multiline string from C# - consider exclusion pragmatic
  • Intend to update GitHub Actions workflow in a follow up pull request to validate, or fully assess use of pre-commit-ci with this polyglot repo

📋 Checklist:

  • I agree to respect and uphold the Cucumber Community Code of Conduct
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.
  • Users should know about my change
    • I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

@kieran-ryan kieran-ryan added the 🏦 debt Tech debt label Jan 4, 2025
@kieran-ryan kieran-ryan self-assigned this Jan 4, 2025
@kieran-ryan kieran-ryan force-pushed the debt/format-py-parser branch from c50567f to ab8bc7c Compare January 4, 2025 23:32
@kieran-ryan kieran-ryan marked this pull request as ready for review January 4, 2025 23:37
@kieran-ryan kieran-ryan requested review from youtux and jsa34 January 4, 2025 23:37
@youtux
Copy link
Contributor

youtux commented Jan 5, 2025

I’m 👎 for this change:

self.ast_builder = ast_builder if ast_builder is not None else AstBuilder() -> self.ast_builder = ast_builder or AstBuilder()

the reason is that you should be able to subclass AstParser and override __bool__ to return False, and the above code should still use your object.

- Removes configuring exclusion of any Python modules from formatting
- Ensure entire library is formatted correctly
- Drop redundant `or False` logic from conditionals
- Drop redundant `else` conditionals after `return` within `if` clause
- Skipped formatting on `state_comment` string
  - Challenging to generate formatted Python multiline string from C#
- Faster tooling
- Better handling with linting rule conflicts
- Enables extending to hundreds of linting rules
- Aligns with pytest-bdd (pytest-dev/pytest-bdd#758)
@kieran-ryan kieran-ryan force-pushed the debt/format-py-parser branch from ab8bc7c to 52864a5 Compare January 5, 2025 22:53
@kieran-ryan
Copy link
Member Author

kieran-ryan commented Jan 5, 2025

I’m 👎 for this change:

self.ast_builder = ast_builder if ast_builder is not None else AstBuilder() -> self.ast_builder = ast_builder or AstBuilder()

the reason is that you should be able to subclass AstParser and override __bool__ to return False, and the above code should still use your object.

Makes sense - applied! Had been unsure on the same for a somewhat related reason - being too implicit by assessing "truthiness" rather than the explicit type declarations - which could become a consideration with changes down the line should users begin passing additional values. Accustomed to using in a local context without this consideration. Interesting, thank you.

@mpkorstanje mpkorstanje requested a review from youtux January 12, 2025 19:07
if self.stop_at_first_error:
raise error
self.add_error(context, error)
return 48

# GherkinD
Copy link
Contributor

@youtux youtux Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we would short-circuit the evaluation at the first expression that was true, but now we always evaluate all the expressions.

is it possible this brings a significant performance degradation while parsing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check this out - thanks for pointer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏦 debt Tech debt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants