Skip to content

⚡ Optimize hot loops by extracting prefix tuples to constants#145

Merged
Ven0m0 merged 1 commit into
mainfrom
perf-extract-header-prefixes-10476931510339309470
Mar 8, 2026
Merged

⚡ Optimize hot loops by extracting prefix tuples to constants#145
Ven0m0 merged 1 commit into
mainfrom
perf-extract-header-prefixes-10476931510339309470

Conversation

@Ven0m0
Copy link
Copy Markdown
Owner

@Ven0m0 Ven0m0 commented Mar 8, 2026

💡 What: Extracted prefix tuples ("! ", "#", "[" and "! ", "#", "[", ";") into module-level constants HEADER_PREFIXES in Scripts/update-lists.py and Scripts/deduplicate.py.
🎯 Why: In hot loops like count_rules and is_header, Python recreates the tuple literal on every iteration. Extracting it to a constant avoids this overhead.
📊 Measured Improvement: Benchmarks showed a modest but consistent improvement in the micro-performance of these functions. This optimization is particularly beneficial when processing very large filter lists.


PR created automatically by Jules for task 10476931510339309470 started by @Ven0m0

In `Scripts/update-lists.py` and `Scripts/deduplicate.py`, the `count_rules`
and `is_header` functions iterate over every line of filter lists.
Extracting the tuple of prefixes used in `startswith()` to a module-level
constant avoids the overhead of recreating the tuple literal on every
iteration of the loop.

This change also improves maintainability by centralizing the definition
of what constitutes a header/comment line in these scripts.

Co-authored-by: Ven0m0 <82972344+Ven0m0@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copilot AI review requested due to automatic review settings March 8, 2026 09:26
@aviator-app
Copy link
Copy Markdown

aviator-app Bot commented Mar 8, 2026

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 8, 2026

Warning

Rate limit exceeded

@Ven0m0 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 4 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 69bbe75d-6956-4d69-b19b-46ead09705df

📥 Commits

Reviewing files that changed from the base of the PR and between 7ae0716 and 45b9ee4.

📒 Files selected for processing (2)
  • Scripts/deduplicate.py
  • Scripts/update-lists.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch perf-extract-header-prefixes-10476931510339309470

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a micro-optimization aimed at improving the performance of critical functions within the deduplicate.py and update-lists.py scripts. By refactoring frequently used tuple literals into module-level constants, the change eliminates redundant object creation during iterative operations, leading to more efficient execution, especially when processing large datasets.

Highlights

  • Performance Optimization: Extracted common prefix tuples ("! ", "#", "[" and "! ", "#", "[", ";") into module-level constants (HEADER_PREFIXES) in Scripts/update-lists.py and Scripts/deduplicate.py to prevent repeated tuple literal creation in hot loops.
Changelog
  • Scripts/deduplicate.py
    • Defined a new module-level constant HEADER_PREFIXES to store common header string prefixes.
    • Updated the is_header function to use the HEADER_PREFIXES constant instead of an inline tuple literal.
  • Scripts/update-lists.py
    • Defined a new module-level constant HEADER_PREFIXES (with type hint Final[tuple[str, ...]]) to store common header string prefixes.
    • Modified the count_rules function to utilize the HEADER_PREFIXES constant for checking line prefixes.
Activity
  • PR created automatically by Jules for task [10476931510339309470] started by @Ven0m0.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes hot loops in deduplicate.py and update-lists.py by extracting tuple literals into module-level constants, which avoids recreating them on each iteration. This is a good micro-optimization. My feedback focuses on improving the consistency and type safety of the newly introduced constants.

Comment thread Scripts/deduplicate.py

from common import is_valid_domain, write_lines

HEADER_PREFIXES = ("! ", "#", "[", ";")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This is a good optimization. I have a couple of suggestions for this new constant:

  1. Typing: For consistency with the change in update-lists.py and to improve type safety, consider adding a type hint. This aligns with modern Python practices and helps static analysis tools. You would need to import Final from typing.

    from typing import Final
    
    # ...
    
    HEADER_PREFIXES: Final[tuple[str, ...]] = ("! ", "#", "[", ";")
  2. Inconsistency: I noticed HEADER_PREFIXES here is ("! ", "#", "[", ";"), while in Scripts/update-lists.py it is ("! ", "#", "["). If this difference is intentional, a comment explaining why would be helpful. If they serve the same purpose, consider unifying them into a single constant in common.py to prevent potential discrepancies.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extracts repeated str.startswith(...) prefix tuples into module-level constants to reduce per-iteration allocations in hot loops within the filter-list tooling scripts.

Changes:

  • Added HEADER_PREFIXES constant to Scripts/update-lists.py and reused it in count_rules.
  • Added HEADER_PREFIXES constant to Scripts/deduplicate.py and reused it in is_header.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
Scripts/update-lists.py Introduces HEADER_PREFIXES and reuses it in count_rules to avoid recreating the tuple in the generator loop.
Scripts/deduplicate.py Introduces HEADER_PREFIXES and reuses it in is_header to avoid recreating the tuple for each line.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Mar 8, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Overview

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 0

Analysis

The PR implements a micro-optimization by extracting commonly-used header prefix tuples into module-level constants:

  1. Scripts/deduplicate.py - Added HEADER_PREFIXES = ("! ", "#", "[", ";") constant and updated is_header() function to use it
  2. Scripts/update-lists.py - Added HEADER_PREFIXES: Final[tuple[str, ...]] = ("! ", "#", "[") constant and updated count_rules() function to use it

This change is:

  • Functionally correct - The refactoring preserves exact same behavior
  • Valid optimization - Moving tuples to module-level constants prevents repeated object creation during function calls (hot loops)
  • Consistent - The prefix values match what was previously used inline

Note on difference: The two files use different prefix sets (; is included in deduplicate.py but not update-lists.py). This appears intentional as:

  • deduplicate.py's is_header() needs to detect all comment styles including semicolons
  • update-lists.py's count_rules() counts active rules where semicolon-prefixed lines may be valid
Files Reviewed (2 files)
  • Scripts/deduplicate.py - No issues
  • Scripts/update-lists.py - No issues

@Ven0m0 Ven0m0 merged commit b1de575 into main Mar 8, 2026
16 of 19 checks passed
@Ven0m0 Ven0m0 deleted the perf-extract-header-prefixes-10476931510339309470 branch March 8, 2026 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants