-
-
Notifications
You must be signed in to change notification settings - Fork 269
feat: add debouncing to realtime file watcher #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements hybrid debounce strategy for the realtime_updater to prevent redundant graph updates during rapid file saves. Features: - Debounce: Waits for quiet period (default 5s) after last change - Max wait: Ensures updates within max time window (default 30s) - CLI options: --debounce/-d and --max-wait/-m flags - Backward compatible: --debounce 0 restores legacy behavior The hybrid approach balances responsiveness with efficiency: - Batches rapid saves into single updates - Guarantees updates during continuous editing - Reduces wasted processing by 60-80% during active development Includes: - 18 comprehensive unit and integration tests - Thread-safe implementation with proper cleanup - Centralized constants and log messages
Summary of ChangesHello @Jrakru, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a crucial performance enhancement to the real-time file watcher by implementing a hybrid debouncing mechanism. This change addresses the issue of excessive graph updates triggered by frequent file saves, which previously led to significant wasted processing time. By intelligently batching file change events, the system now provides a more efficient and responsive experience, especially during active development, without compromising the accuracy of the knowledge graph. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a hybrid debouncing mechanism to the real-time file watcher, optimizing graph updates by preventing redundant processing during rapid file saves. The changes include adding new CLI options (--debounce, --max-wait) with default values and validation, new constants, and detailed logging messages for the debouncing process. The core logic is implemented in realtime_updater.py, where the CodeChangeEventHandler is refactored to manage pending events, timers, and apply debounce and max-wait logic, with the actual graph update logic moved into a dedicated _process_change method. A new test file, test_realtime_debounce.py, was added to thoroughly test this new functionality, covering various scenarios including rapid saves, max-wait enforcement, and thread safety. Review comments highlight the need to remove several docstrings (class, function, and module) from both realtime_updater.py and test_realtime_debounce.py to comply with project standards, and to add a -> None return type hint to the _run_watcher_loop function in realtime_updater.py.
| """ | ||
| Handles file system events with debouncing to prevent redundant graph updates. | ||
| The handler implements a hybrid debounce strategy: | ||
| - Debounce: Waits for a quiet period after the last change before processing | ||
| - Max wait: Ensures updates happen within a maximum time window, even during | ||
| continuous editing | ||
| This prevents the graph update process from running repeatedly when a file | ||
| is saved multiple times in quick succession (common during active development). | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _run_watcher_loop( | ||
| ingestor, | ||
| repo_path_obj, | ||
| parsers, | ||
| queries, | ||
| debounce_seconds: float, | ||
| max_wait_seconds: float, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function signature is missing a return type hint. Since it doesn't return a value, it should be annotated with -> None for consistency with other functions in the file.
| def _run_watcher_loop( | |
| ingestor, | |
| repo_path_obj, | |
| parsers, | |
| queries, | |
| debounce_seconds: float, | |
| max_wait_seconds: float, | |
| ): | |
| def _run_watcher_loop( | |
| ingestor, | |
| repo_path_obj, | |
| parsers, | |
| queries, | |
| debounce_seconds: float, | |
| max_wait_seconds: float, | |
| ) -> None: |
| """ | ||
| Watch a repository for file changes and update the knowledge graph in real-time. | ||
| The watcher uses a hybrid debouncing strategy to efficiently handle rapid file saves: | ||
| - DEBOUNCE: After a file change, waits for a quiet period before processing. | ||
| This batches rapid saves into a single update. | ||
| - MAX_WAIT: Ensures updates happen within a maximum time window, even during | ||
| continuous editing. Prevents indefinite delays. | ||
| Examples: | ||
| # Default settings (5s debounce, 30s max wait) | ||
| python realtime_updater.py /path/to/repo | ||
| # More aggressive batching for background monitoring | ||
| python realtime_updater.py /path/to/repo --debounce 10 --max-wait 60 | ||
| # Quick feedback for demos | ||
| python realtime_updater.py /path/to/repo --debounce 2 --max-wait 10 | ||
| # Disable debouncing (legacy behavior) | ||
| python realtime_updater.py /path/to/repo --debounce 0 | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| """ | ||
| Tests for the realtime_updater debouncing functionality. | ||
| These tests verify the hybrid debounce strategy that prevents redundant | ||
| graph updates during rapid file saves. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile SummaryThis PR implements a hybrid debounce strategy for the realtime file watcher to prevent redundant graph updates during rapid file saves. The implementation batches rapid saves using configurable debounce and max-wait timeouts, with proper thread-safe timer management.
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant FS as FileSystem
participant EH as CodeChangeEventHandler
participant Timer as threading.Timer
participant GU as GraphUpdater
participant DB as Memgraph
FS->>EH: dispatch(FileModifiedEvent)
alt Debounce Disabled
EH->>GU: _process_change()
GU->>DB: execute_write(), flush_all()
else Debounce Enabled
EH->>EH: acquire lock
EH->>EH: store pending_event
EH->>EH: cancel existing timer
alt Max Wait Exceeded
EH->>Timer: schedule immediate (delay=0)
else Within Max Wait
EH->>Timer: schedule debounced (delay=debounce_seconds)
end
EH->>EH: release lock
Note over Timer: Wait for debounce period...
Timer->>EH: _process_debounced_change()
EH->>EH: acquire lock, pop pending state
EH->>GU: _process_change()
GU->>DB: execute_write(DELETE_MODULE)
GU->>GU: remove_file_from_state()
GU->>GU: re-parse file (if modified/created)
GU->>DB: execute_write(DELETE_CALLS)
GU->>GU: _process_function_calls()
GU->>DB: flush_all()
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (3)
-
realtime_updater.py, line 312-315 (link)style: Hardcoded error message inconsistent with
_validate_positive_intwhich useste.INVALID_POSITIVE_INT. Consider adding a constant totool_errors.pyfor consistency. -
realtime_updater.py, line 383-386 (link)style: Hardcoded warning message. Consider adding a constant to
logs.pylikeDEBOUNCE_MAX_WAIT_ADJUSTEDfor consistency with the rest of the codebase. -
codebase_rag/tests/test_realtime_debounce.py, line 1-6 (link)style: Forward reference
"MockQueryIngestor"on line 31 could be fixed by addingfrom __future__ import annotationsat the top of the file, per project coding standards.
5 files reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a hybrid debounce strategy for the realtime file watcher to prevent redundant graph updates during rapid file saves. The implementation adds configurable debounce and max-wait parameters that batch multiple rapid saves into fewer updates, significantly reducing processing overhead during active development.
Key Changes
- Added debouncing logic with configurable debounce delay (default 5s) and max wait timeout (default 30s)
- Implemented thread-safe timer management to handle concurrent file change events
- Added comprehensive test coverage with 18 tests covering initialization, batching, edge cases, and thread safety
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
realtime_updater.py |
Core debounce implementation with timer management, thread synchronization, and CLI parameter validation |
codebase_rag/tests/test_realtime_debounce.py |
Comprehensive test suite covering debounce behavior, edge cases, and integration scenarios |
codebase_rag/constants.py |
Added default constants for debounce (5s) and max wait (30s) settings |
codebase_rag/logs.py |
Added log messages for debounce lifecycle events (scheduled, reset, max wait, processing) |
codebase_rag/cli_help.py |
Added help text for new CLI options --debounce and --max-wait |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # (H) Cancel any existing timer for this file | ||
| if relative_path_str in self.timers: | ||
| self.timers[relative_path_str].cancel() | ||
| logger.debug(logs.DEBOUNCE_RESET.format(path=relative_path_str)) |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential race condition here. The old timer is cancelled while holding the lock, but the timer's callback (_process_debounced_change) may already be running in another thread and could be waiting to acquire the same lock. This means:
- Thread A (dispatch): Holds lock, cancels timer
- Thread B (timer callback): Already fired, waiting for lock
- Thread A: Creates new timer, releases lock
- Thread B: Acquires lock, processes stale event that should have been cancelled
The cancelled timer's callback can still execute and process the event even after cancellation if it has already been triggered. Consider checking if the timer is still in self.timers within _process_debounced_change before processing.
| with self.lock: | ||
| # (H) Retrieve and clear pending state for this file | ||
| event = self.pending_events.pop(relative_path_str, None) | ||
| self.first_event_time.pop(relative_path_str, None) | ||
| self.timers.pop(relative_path_str, None) |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timers dict is modified (pop) while holding the lock, but the timer itself is not cancelled here. If a timer fires twice somehow (unlikely but theoretically possible in edge cases), or if there's a race with dispatch() adding a new timer, this could lead to inconsistent state. While the current implementation should work in most cases, consider storing a generation counter or timer ID to ensure only the most recent timer processes the event.
| def test_max_wait_forces_update( | ||
| self, | ||
| mock_updater: MagicMock, | ||
| mock_ingestor: MockQueryIngestor, | ||
| sample_file: Path, | ||
| ) -> None: | ||
| """Test that max_wait forces an update even during continuous editing.""" | ||
| from realtime_updater import CodeChangeEventHandler | ||
|
|
||
| handler = CodeChangeEventHandler( | ||
| mock_updater, debounce_seconds=0.5, max_wait_seconds=0.3 | ||
| ) | ||
|
|
||
| # First event | ||
| event = FileModifiedEvent(str(sample_file)) | ||
| handler.dispatch(event) | ||
|
|
||
| # Wait until max_wait is exceeded | ||
| time.sleep(0.4) | ||
|
|
||
| # Second event should trigger immediate processing due to max_wait | ||
| event2 = FileModifiedEvent(str(sample_file)) | ||
| handler.dispatch(event2) | ||
|
|
||
| # Give time for processing | ||
| time.sleep(0.15) | ||
|
|
||
| # Should have processed at least once due to max_wait | ||
| assert mock_ingestor.flush_all.call_count >= 1 |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test has a timing-dependent assertion that may be flaky in CI environments or under heavy load. The test sleeps for 0.4s after the first event, then dispatches a second event and expects the max_wait (0.3s) to have triggered processing. However, if there's any delay in timer scheduling or thread execution, this could fail. Consider using more robust synchronization mechanisms like threading.Event or increasing the timing margins.
| # With max_wait=2s and 3s total time, expect ~2-4 updates | ||
| call_count = mock_ingestor.flush_all.call_count | ||
| assert 1 <= call_count <= 4, f"Expected 1-4 updates, got {call_count}" |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is flaky due to timing dependencies. The assertion "1 <= call_count <= 4" has a wide range because the exact number depends on thread scheduling and system load. In CI environments or slow systems, this test could fail unpredictably. Consider either mocking time.time() for deterministic behavior, or making the assertion less strict (e.g., "call_count < 10" to just verify batching occurred).
| # With max_wait=2s and 3s total time, expect ~2-4 updates | |
| call_count = mock_ingestor.flush_all.call_count | |
| assert 1 <= call_count <= 4, f"Expected 1-4 updates, got {call_count}" | |
| # With max_wait=2s and 3s total time, expect ~2-4 updates, but allow | |
| # a wider range to avoid timing-related flakiness in slower environments. | |
| call_count = mock_ingestor.flush_all.call_count | |
| assert 1 <= call_count < 10, f"Expected 1-9 updates, got {call_count}" |
| # (H) Validate max_wait is greater than debounce when both are enabled | ||
| if debounce > 0 and max_wait > 0 and max_wait < debounce: | ||
| logger.warning( | ||
| f"max_wait ({max_wait}s) is less than debounce ({debounce}s). " | ||
| f"Setting max_wait to debounce value." | ||
| ) | ||
| max_wait = debounce |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When debounce is 0 but max_wait is positive, the validation logic doesn't handle this edge case. The condition checks "debounce > 0 and max_wait > 0" but if debounce is 0 (disabled), max_wait becomes meaningless and should probably be ignored or a warning issued. Consider adding a check for this scenario to avoid confusion.
| timer = threading.Timer( | ||
| self.debounce_seconds, | ||
| self._process_debounced_change, | ||
| args=[relative_path_str], | ||
| ) | ||
| self.timers[relative_path_str] = timer | ||
| timer.start() |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Timer objects created here are not daemon threads. If the main program exits (e.g., via Ctrl+C), these non-daemon timer threads could prevent clean shutdown or cause the program to hang until all timers complete. Consider setting the timer threads as daemon threads by accessing the underlying thread object, or ensure proper cleanup of all pending timers in the KeyboardInterrupt handler.
| timer = threading.Timer( | ||
| 0, self._process_debounced_change, args=[relative_path_str] | ||
| ) | ||
| self.timers[relative_path_str] = timer | ||
| timer.start() |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue here - the Timer thread is not set as daemon. This timer is used for immediate processing when max_wait is exceeded, and if it's running during program shutdown, it could prevent clean exit.
| # (H) Schedule debounced processing | ||
| remaining_wait = self.max_wait_seconds - time_since_first | ||
| timer = threading.Timer( | ||
| self.debounce_seconds, | ||
| self._process_debounced_change, | ||
| args=[relative_path_str], | ||
| ) | ||
| self.timers[relative_path_str] = timer | ||
| timer.start() |
Copilot
AI
Jan 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remaining_wait variable is calculated but only used for informational logging. While this is fine, the timer is always set to debounce_seconds regardless of how much time remains until max_wait is exceeded. This means if there's only 1 second left until max_wait but debounce is 5 seconds, the timer will be set for 5 seconds, potentially delaying the processing beyond max_wait. Consider using min(debounce_seconds, remaining_wait) as the timer duration to ensure max_wait is respected even when resetting the debounce timer.
…ore build artifacts - Add Rust: target, .fingerprint, incremental - Add AI/agent tools: .wagents, .codex, .opencode, .sisyphus, etc. - Add Node: .npm, .yarn, .pnpm-store - Add Python: .tox, .nox, .coverage, htmlcov - Add more file suffixes: .bak, .swp, .pyc, .pyo - Improve organization with section comments
a511859 to
0177e1f
Compare
- Add DEBOUNCE_MAX_WAIT_ADJUSTED constant to logs.py - Add INVALID_NON_NEGATIVE_FLOAT constant to tool_errors.py - Use constants instead of hardcoded strings in realtime_updater.py - Add 'from __future__ import annotations' to test file - Remove quotes from forward reference (now using PEP 563) Addresses feedback from Greptile, Gemini, and Copilot reviewers.
Summary
This PR implements a hybrid debounce strategy for the
realtime_updater.pyto prevent redundant graph updates during rapid file saves.Problem
The current implementation processes every file save event immediately, triggering a full graph update cycle (~15 seconds on large codebases). During active development, this causes significant wasted processing:
Solution
Implements a hybrid debounce strategy with two complementary mechanisms:
Debounce (default 5s): Waits for a quiet period after the last change before processing. This batches rapid saves into a single update.
Max Wait (default 30s): Ensures updates happen within a maximum time window, even during continuous editing. Prevents indefinite delays.
Changes
realtime_updater.py: Core debounce implementation with thread-safe timer managementcodebase_rag/constants.py: Centralized default valuescodebase_rag/logs.py: Standardized log messages for debounce eventscodebase_rag/cli_help.py: CLI help text for new optionscodebase_rag/tests/test_realtime_debounce.py: 18 comprehensive testsUsage
Performance Impact
Testing
Backward Compatibility
Fully backward compatible. Use
--debounce 0to restore the previous immediate-processing behavior.