Skip to content

Conversation

@jxstanford
Copy link
Contributor

Summary

Add support for a .cgrignore file that allows users to specify additional directories to exclude from parsing, similar to .gitignore but simpler.

  • Patterns from .cgrignore are merged with --exclude CLI flags and auto-detected directories
  • File format: one directory name per line, # comments, blank lines ignored
  • Integrates with the existing exclude patterns interactive prompt

Changes

  • codebase_rag/config.py: Add load_cgrignore_patterns() function and CGRIGNORE_FILENAME constant
  • codebase_rag/main.py: Integrate .cgrignore loading into prompt_exclude_directories()
  • codebase_rag/constants.py: Add EXCLUDE_STATUS_CGRIGNORE for UI display
  • codebase_rag/tests/test_cgrignore.py: Add 6 unit tests
  • README.md: Document the feature

Example .cgrignore

# Comments start with #
vendor
.custom_cache
my_build_output

Test plan

  • Unit tests pass (pytest codebase_rag/tests/test_cgrignore.py)
  • Manual test with a .cgrignore file in a repository

Add support for a `.cgrignore` file that allows users to specify
additional directories to exclude from parsing. Patterns from this
file are merged with `--exclude` CLI flags and auto-detected directories.

- Add `load_cgrignore_patterns()` function in config.py
- Integrate with `prompt_exclude_directories()` in main.py
- Add EXCLUDE_STATUS_CGRIGNORE constant for UI display
- Add tests for .cgrignore loading
- Update README with documentation
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @jxstanford, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the directory exclusion mechanism by implementing support for a .cgrignore file. This file allows users to specify custom directories to be ignored during processing, similar to how .gitignore functions for version control. The patterns defined in .cgrignore are intelligently merged with any command-line exclusions and automatically detected directories, providing a flexible and persistent way to manage ignored paths. This change improves user control and streamlines the configuration of excluded content.

Highlights

  • New .cgrignore file support: Users can now define custom directory exclusion patterns in a .cgrignore file, similar to .gitignore.
  • Flexible pattern merging: Patterns from .cgrignore are seamlessly integrated with command-line exclusions and auto-detected directories.
  • Simple file format: The .cgrignore file supports one directory name per line, comments with #, and ignores blank lines.
  • Enhanced exclusion prompt: The interactive exclusion prompt now displays directories excluded via .cgrignore as a source.
  • Comprehensive unit tests: New unit tests have been added to ensure the robustness of the .cgrignore parsing and integration.
  • Updated documentation: The README.md has been updated to explain how to use the new .cgrignore feature.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for a .cgrignore file, allowing users to specify custom directories to exclude from parsing. The implementation is clean and follows the project's patterns. The changes are well-documented in the README and thoroughly tested with new unit tests. I have one minor suggestion regarding the placement of an import statement in codebase_rag/config.py to improve consistency. Overall, this is a great addition to the project.

@greptile-apps
Copy link

greptile-apps bot commented Jan 4, 2026

Greptile Summary

This PR adds support for a .cgrignore file that allows users to specify custom directories to exclude from parsing, similar to .gitignore but simpler. The implementation integrates cleanly with the existing exclude patterns system, merging patterns from .cgrignore with CLI flags and auto-detected directories.

Key Changes

  • Added load_cgrignore_patterns() function in config.py to parse .cgrignore files
  • Integrated .cgrignore loading into prompt_exclude_directories() with proper pattern merging
  • Added EXCLUDE_STATUS_CGRIGNORE constant for UI display
  • Comprehensive test coverage with 6 unit tests covering edge cases
  • Clear documentation in README with examples

Issues Found

  • Code Style: Hardcoded log messages in config.py violate project coding standards (rule d4240b05-b763-467a-a6bf-94f73e8b6859) which require all log messages to be defined in logs.py. The messages should be moved to logs.py as CGRIGNORE_LOADED and CGRIGNORE_READ_FAILED.
  • Import Pattern: Function-scope import of loguru.logger should be at module level per project standards.

Strengths

  • Clean integration with existing patterns system using frozenset union operations
  • Good error handling with graceful fallback on read errors
  • Proper handling of comments, blank lines, and whitespace
  • Excellent test coverage including error scenarios

Confidence Score: 4/5

  • Safe to merge with minor style improvements - functionality is solid
  • Score reflects solid implementation with comprehensive tests and clean integration, but hardcoded log messages violate project coding standards. These are style issues rather than functional problems, so the PR is safe to merge.
  • codebase_rag/config.py needs log messages moved to logs.py per coding standards

Important Files Changed

Filename Overview
codebase_rag/config.py Added load_cgrignore_patterns() function and CGRIGNORE_FILENAME constant. Hardcoded log messages violate coding standards requiring logs in logs.py.
codebase_rag/main.py Integrated .cgrignore loading into prompt_exclude_directories() with proper pattern merging and status display logic.
codebase_rag/tests/test_cgrignore.py Comprehensive test coverage for .cgrignore functionality - all edge cases tested properly.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI
    participant main.py
    participant config.py
    participant FileSystem
    participant UI

    User->>CLI: Run with --exclude flags
    CLI->>main.py: prompt_exclude_directories(repo_path, cli_excludes)
    
    main.py->>main.py: detect_root_excludable_directories(repo_path)
    Note over main.py: Auto-detect common dirs<br/>(node_modules, .git, etc)
    
    main.py->>config.py: load_cgrignore_patterns(repo_path)
    config.py->>FileSystem: Check .cgrignore exists
    
    alt .cgrignore exists
        FileSystem-->>config.py: File found
        config.py->>FileSystem: Read file contents
        FileSystem-->>config.py: File contents
        config.py->>config.py: Parse lines (strip, filter comments/blanks)
        config.py->>config.py: Add to set, convert to frozenset
        config.py-->>main.py: frozenset of patterns
    else .cgrignore missing
        FileSystem-->>config.py: File not found
        config.py-->>main.py: empty frozenset
    end
    
    main.py->>main.py: Merge: cli_patterns | cgrignore_patterns
    main.py->>main.py: Merge with detected: all_candidates = detected | pre_excluded
    
    alt skip_prompt = True
        main.py-->>CLI: Return all_candidates
    else Interactive mode
        main.py->>UI: Display table with sources<br/>(CLI, .cgrignore, auto-detected)
        UI->>User: Prompt for selection
        User-->>UI: Response (all/none/numbers)
        UI-->>main.py: Selected patterns
        main.py-->>CLI: frozenset of excluded dirs
    end
    
    CLI->>CLI: Use excluded dirs in parsing
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. codebase_rag/config.py, line 251 (link)

    style: hardcoded log message violates coding standards - should be in logs.py as CGRIGNORE_LOADED

    Agentic Framework

    • PydanticAI Only: This project uses PydanticAI... (source)

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    Context Used: Rule from dashboard - ## Technical Requirements

  2. codebase_rag/config.py, line 254 (link)

    style: hardcoded log message violates coding standards - should be in logs.py as CGRIGNORE_READ_FAILED

    Agentic Framework

    • PydanticAI Only: This project uses PydanticAI... (source)

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    Context Used: Rule from dashboard - ## Technical Requirements

  3. codebase_rag/config.py, line 236 (link)

    style: import at function scope is acceptable but against project standards - loguru should be imported at module level

    Consider moving to top of file with other imports

    Agentic Framework

    • PydanticAI Only: This project uses PydanticAI... (source)

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    Context Used: Rule from dashboard - ## Technical Requirements

5 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Addresses PR review feedback:
- Move loguru import to module level in config.py
- Use log constants from logs.py instead of f-strings
@jxstanford
Copy link
Contributor Author

I thought you might find this useful. Here's what ended up in my .cgrignore file for one of our codebases:

# Virtual environments
mlx-venv
notebook-venv

# Build artifacts
runtime
llama.cpp
llamacpp

# External registries
kamiwaza-extension-registry
datahub

# Tool/IDE config directories
.agent
.ai
.codex
.kw
.opencode
.specify
.temp-dev-docs
.superclaude

# Cache and temp
.cache
.uv-cache
.tmp
logs
log

# Test directories
test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant