feat: add debouncing to realtime file watcher #213

Jrakru · 2026-01-03T20:35:33Z

Summary

This PR implements a hybrid debounce strategy for the realtime_updater.py to prevent redundant graph updates during rapid file saves.

Problem

The current implementation processes every file save event immediately, triggering a full graph update cycle (~15 seconds on large codebases). During active development, this causes significant wasted processing:

10 rapid saves → 10 × 15s = 150 seconds of processing
Most intermediate saves are obsolete by the time they finish processing

Solution

Implements a hybrid debounce strategy with two complementary mechanisms:

Debounce (default 5s): Waits for a quiet period after the last change before processing. This batches rapid saves into a single update.
Max Wait (default 30s): Ensures updates happen within a maximum time window, even during continuous editing. Prevents indefinite delays.

Changes

realtime_updater.py: Core debounce implementation with thread-safe timer management
codebase_rag/constants.py: Centralized default values
codebase_rag/logs.py: Standardized log messages for debounce events
codebase_rag/cli_help.py: CLI help text for new options
codebase_rag/tests/test_realtime_debounce.py: 18 comprehensive tests

Usage

# Default settings (5s debounce, 30s max wait)
python realtime_updater.py /path/to/repo

# More aggressive batching for background monitoring
python realtime_updater.py /path/to/repo --debounce 10 --max-wait 60

# Quick feedback for demos
python realtime_updater.py /path/to/repo --debounce 2 --max-wait 10

# Disable debouncing (legacy behavior)
python realtime_updater.py /path/to/repo --debounce 0

Performance Impact

Scenario	Before	After	Improvement
10 rapid saves	10 updates (150s)	1-2 updates (15-30s)	80% reduction
30-min coding session	~40 updates	~10 updates	75% reduction

Testing

✅ 18 unit and integration tests covering:
- Initialization with/without debouncing
- Rapid event batching
- Max wait timeout behavior
- Thread safety with concurrent events
- Timer cleanup
- CLI validation
✅ All existing tests pass
✅ Linting passes (ruff)

Backward Compatibility

Fully backward compatible. Use --debounce 0 to restore the previous immediate-processing behavior.

Implements hybrid debounce strategy for the realtime_updater to prevent redundant graph updates during rapid file saves. Features: - Debounce: Waits for quiet period (default 5s) after last change - Max wait: Ensures updates within max time window (default 30s) - CLI options: --debounce/-d and --max-wait/-m flags - Backward compatible: --debounce 0 restores legacy behavior The hybrid approach balances responsiveness with efficiency: - Batches rapid saves into single updates - Guarantees updates during continuous editing - Reduces wasted processing by 60-80% during active development Includes: - 18 comprehensive unit and integration tests - Thread-safe implementation with proper cleanup - Centralized constants and log messages

gemini-code-assist · 2026-01-03T20:35:49Z

Summary of Changes

Hello @Jrakru, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial performance enhancement to the real-time file watcher by implementing a hybrid debouncing mechanism. This change addresses the issue of excessive graph updates triggered by frequent file saves, which previously led to significant wasted processing time. By intelligently batching file change events, the system now provides a more efficient and responsive experience, especially during active development, without compromising the accuracy of the knowledge graph.

Highlights

Hybrid Debounce Strategy: Implemented a hybrid debounce strategy in realtime_updater.py to prevent redundant graph updates during rapid file saves. This combines a quiet period debounce with a maximum wait time.
Performance Optimization: The new strategy significantly reduces processing time by batching rapid file changes, leading to an estimated 80% reduction in updates for 10 rapid saves and a 75% reduction over a 30-minute coding session.
Configurable Debounce Parameters: Introduced new CLI options --debounce (default 5s) and --max-wait (default 30s) to control the debounce and maximum wait periods, allowing users to customize the behavior.
Comprehensive Testing: Added 18 new unit and integration tests in test_realtime_debounce.py to thoroughly cover the debounce logic, including rapid event batching, max wait timeout, thread safety, and CLI validation.
Backward Compatibility: The feature is fully backward compatible; debouncing can be disabled by setting --debounce 0 to restore the previous immediate-processing behavior.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a hybrid debouncing mechanism to the real-time file watcher, optimizing graph updates by preventing redundant processing during rapid file saves. The changes include adding new CLI options (--debounce, --max-wait) with default values and validation, new constants, and detailed logging messages for the debouncing process. The core logic is implemented in realtime_updater.py, where the CodeChangeEventHandler is refactored to manage pending events, timers, and apply debounce and max-wait logic, with the actual graph update logic moved into a dedicated _process_change method. A new test file, test_realtime_debounce.py, was added to thoroughly test this new functionality, covering various scenarios including rapid saves, max-wait enforcement, and thread safety. Review comments highlight the need to remove several docstrings (class, function, and module) from both realtime_updater.py and test_realtime_debounce.py to comply with project standards, and to add a -> None return type hint to the _run_watcher_loop function in realtime_updater.py.

gemini-code-assist · 2026-01-03T20:37:41Z

realtime_updater.py

+    """
+    Handles file system events with debouncing to prevent redundant graph updates.
+
+    The handler implements a hybrid debounce strategy:
+    - Debounce: Waits for a quiet period after the last change before processing
+    - Max wait: Ensures updates happen within a maximum time window, even during
+                continuous editing
+
+    This prevents the graph update process from running repeatedly when a file
+    is saved multiple times in quick succession (common during active development).
+    """


This class docstring violates the project's rule against using docstrings. Please remove it to adhere to the project's standards.

References

Docstrings are not allowed in this project, as enforced by a pre-commit hook.

gemini-code-assist · 2026-01-03T20:37:41Z

realtime_updater.py

+def _run_watcher_loop(
+    ingestor,
+    repo_path_obj,
+    parsers,
+    queries,
+    debounce_seconds: float,
+    max_wait_seconds: float,
+):


This function signature is missing a return type hint. Since it doesn't return a value, it should be annotated with -> None for consistency with other functions in the file.

Suggested change

def _run_watcher_loop(

ingestor,

repo_path_obj,

parsers,

queries,

debounce_seconds: float,

max_wait_seconds: float,

):

def _run_watcher_loop(

ingestor,

repo_path_obj,

parsers,

queries,

debounce_seconds: float,

max_wait_seconds: float,

) -> None:

gemini-code-assist · 2026-01-03T20:37:41Z

realtime_updater.py

+    """
+    Watch a repository for file changes and update the knowledge graph in real-time.
+
+    The watcher uses a hybrid debouncing strategy to efficiently handle rapid file saves:
+
+    - DEBOUNCE: After a file change, waits for a quiet period before processing.
+      This batches rapid saves into a single update.
+
+    - MAX_WAIT: Ensures updates happen within a maximum time window, even during
+      continuous editing. Prevents indefinite delays.
+
+    Examples:
+
+        # Default settings (5s debounce, 30s max wait)
+        python realtime_updater.py /path/to/repo
+
+        # More aggressive batching for background monitoring
+        python realtime_updater.py /path/to/repo --debounce 10 --max-wait 60
+
+        # Quick feedback for demos
+        python realtime_updater.py /path/to/repo --debounce 2 --max-wait 10
+
+        # Disable debouncing (legacy behavior)
+        python realtime_updater.py /path/to/repo --debounce 0
+    """


This function docstring violates the project's rule against using docstrings. Please remove it to maintain consistency with the project's coding standards.

References

Docstrings are not allowed in this project, as enforced by a pre-commit hook.

gemini-code-assist · 2026-01-03T20:37:41Z

codebase_rag/tests/test_realtime_debounce.py

+"""
+Tests for the realtime_updater debouncing functionality.
+
+These tests verify the hybrid debounce strategy that prevents redundant
+graph updates during rapid file saves.
+"""


This module docstring, and all other docstrings in this file, violate the project's rule against using docstrings. Please remove all docstrings from this new test file to align with the project's standards.

References

Docstrings are not allowed in this project, as enforced by a pre-commit hook.

greptile-apps · 2026-01-03T20:38:28Z

Greptile Summary

This PR implements a hybrid debounce strategy for the realtime file watcher to prevent redundant graph updates during rapid file saves. The implementation batches rapid saves using configurable debounce and max-wait timeouts, with proper thread-safe timer management.

Added CodeChangeEventHandler debouncing with configurable --debounce (default 5s) and --max-wait (default 30s) CLI options
Centralized default constants in codebase_rag/constants.py and log messages in codebase_rag/logs.py
Thread-safe implementation using threading.Lock to protect shared state (timers, pending_events, first_event_time)
Comprehensive test suite with 18 tests covering initialization, batching, max-wait behavior, and thread safety
Backward compatible: use --debounce 0 to restore legacy immediate-processing behavior

Confidence Score: 4/5

This PR is safe to merge with minor style improvements recommended.
Well-structured implementation with proper thread safety, comprehensive tests, and backward compatibility. Minor style inconsistencies with hardcoded strings instead of centralized constants.
realtime_updater.py has two hardcoded strings that could be moved to constants files for consistency.

Important Files Changed

Filename	Overview
realtime_updater.py	Core debounce implementation with thread-safe timer management. Has hardcoded error message on line 314 inconsistent with existing pattern.
codebase_rag/constants.py	Added centralized default debounce constants with proper (H) comment marker.
codebase_rag/logs.py	Added standardized log messages for debounce events following existing patterns.
codebase_rag/cli_help.py	Added CLI help text for debounce and max_wait options.
codebase_rag/tests/test_realtime_debounce.py	Comprehensive tests covering debounce scenarios. Has forward reference on line 31 that could use future annotations.

Sequence Diagram

sequenceDiagram
    participant FS as FileSystem
    participant EH as CodeChangeEventHandler
    participant Timer as threading.Timer
    participant GU as GraphUpdater
    participant DB as Memgraph

    FS->>EH: dispatch(FileModifiedEvent)
    
    alt Debounce Disabled
        EH->>GU: _process_change()
        GU->>DB: execute_write(), flush_all()
    else Debounce Enabled
        EH->>EH: acquire lock
        EH->>EH: store pending_event
        EH->>EH: cancel existing timer
        
        alt Max Wait Exceeded
            EH->>Timer: schedule immediate (delay=0)
        else Within Max Wait
            EH->>Timer: schedule debounced (delay=debounce_seconds)
        end
        
        EH->>EH: release lock
        
        Note over Timer: Wait for debounce period...
        
        Timer->>EH: _process_debounced_change()
        EH->>EH: acquire lock, pop pending state
        EH->>GU: _process_change()
        GU->>DB: execute_write(DELETE_MODULE)
        GU->>GU: remove_file_from_state()
        GU->>GU: re-parse file (if modified/created)
        GU->>DB: execute_write(DELETE_CALLS)
        GU->>GU: _process_function_calls()
        GU->>DB: flush_all()
    end

greptile-apps

Additional Comments (3)

realtime_updater.py, line 312-315 (link)

style: Hardcoded error message inconsistent with _validate_positive_int which uses te.INVALID_POSITIVE_INT. Consider adding a constant to tool_errors.py for consistency.
realtime_updater.py, line 383-386 (link)

style: Hardcoded warning message. Consider adding a constant to logs.py like DEBOUNCE_MAX_WAIT_ADJUSTED for consistency with the rest of the codebase.
codebase_rag/tests/test_realtime_debounce.py, line 1-6 (link)

style: Forward reference "MockQueryIngestor" on line 31 could be fixed by adding from __future__ import annotations at the top of the file, per project coding standards.

_{5 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

Copilot

Pull request overview

This PR implements a hybrid debounce strategy for the realtime file watcher to prevent redundant graph updates during rapid file saves. The implementation adds configurable debounce and max-wait parameters that batch multiple rapid saves into fewer updates, significantly reducing processing overhead during active development.

Key Changes

Added debouncing logic with configurable debounce delay (default 5s) and max wait timeout (default 30s)
Implemented thread-safe timer management to handle concurrent file change events
Added comprehensive test coverage with 18 tests covering initialization, batching, edge cases, and thread safety

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`realtime_updater.py`	Core debounce implementation with timer management, thread synchronization, and CLI parameter validation
`codebase_rag/tests/test_realtime_debounce.py`	Comprehensive test suite covering debounce behavior, edge cases, and integration scenarios
`codebase_rag/constants.py`	Added default constants for debounce (5s) and max wait (30s) settings
`codebase_rag/logs.py`	Added log messages for debounce lifecycle events (scheduled, reset, max wait, processing)
`codebase_rag/cli_help.py`	Added help text for new CLI options --debounce and --max-wait

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-03T20:39:29Z

realtime_updater.py

+            # (H) Cancel any existing timer for this file
+            if relative_path_str in self.timers:
+                self.timers[relative_path_str].cancel()
+                logger.debug(logs.DEBOUNCE_RESET.format(path=relative_path_str))


There's a potential race condition here. The old timer is cancelled while holding the lock, but the timer's callback (_process_debounced_change) may already be running in another thread and could be waiting to acquire the same lock. This means:

Thread A (dispatch): Holds lock, cancels timer

Thread B (timer callback): Already fired, waiting for lock

Thread A: Creates new timer, releases lock

Thread B: Acquires lock, processes stale event that should have been cancelled

The cancelled timer's callback can still execute and process the event even after cancellation if it has already been triggered. Consider checking if the timer is still in self.timers within _process_debounced_change before processing.

Copilot · 2026-01-03T20:39:29Z

realtime_updater.py

+        with self.lock:
+            # (H) Retrieve and clear pending state for this file
+            event = self.pending_events.pop(relative_path_str, None)
+            self.first_event_time.pop(relative_path_str, None)
+            self.timers.pop(relative_path_str, None)


The timers dict is modified (pop) while holding the lock, but the timer itself is not cancelled here. If a timer fires twice somehow (unlikely but theoretically possible in edge cases), or if there's a race with dispatch() adding a new timer, this could lead to inconsistent state. While the current implementation should work in most cases, consider storing a generation counter or timer ID to ensure only the most recent timer processes the event.

Copilot · 2026-01-03T20:39:29Z

codebase_rag/tests/test_realtime_debounce.py

+    def test_max_wait_forces_update(
+        self,
+        mock_updater: MagicMock,
+        mock_ingestor: MockQueryIngestor,
+        sample_file: Path,
+    ) -> None:
+        """Test that max_wait forces an update even during continuous editing."""
+        from realtime_updater import CodeChangeEventHandler
+
+        handler = CodeChangeEventHandler(
+            mock_updater, debounce_seconds=0.5, max_wait_seconds=0.3
+        )
+
+        # First event
+        event = FileModifiedEvent(str(sample_file))
+        handler.dispatch(event)
+
+        # Wait until max_wait is exceeded
+        time.sleep(0.4)
+
+        # Second event should trigger immediate processing due to max_wait
+        event2 = FileModifiedEvent(str(sample_file))
+        handler.dispatch(event2)
+
+        # Give time for processing
+        time.sleep(0.15)
+
+        # Should have processed at least once due to max_wait
+        assert mock_ingestor.flush_all.call_count >= 1


This test has a timing-dependent assertion that may be flaky in CI environments or under heavy load. The test sleeps for 0.4s after the first event, then dispatches a second event and expects the max_wait (0.3s) to have triggered processing. However, if there's any delay in timer scheduling or thread execution, this could fail. Consider using more robust synchronization mechanisms like threading.Event or increasing the timing margins.

Copilot · 2026-01-03T20:39:29Z

codebase_rag/tests/test_realtime_debounce.py

+        # With max_wait=2s and 3s total time, expect ~2-4 updates
+        call_count = mock_ingestor.flush_all.call_count
+        assert 1 <= call_count <= 4, f"Expected 1-4 updates, got {call_count}"


This test is flaky due to timing dependencies. The assertion "1 <= call_count <= 4" has a wide range because the exact number depends on thread scheduling and system load. In CI environments or slow systems, this test could fail unpredictably. Consider either mocking time.time() for deterministic behavior, or making the assertion less strict (e.g., "call_count < 10" to just verify batching occurred).

Suggested change

# With max_wait=2s and 3s total time, expect ~2-4 updates

call_count = mock_ingestor.flush_all.call_count

assert 1 <= call_count <= 4, f"Expected 1-4 updates, got {call_count}"

# With max_wait=2s and 3s total time, expect ~2-4 updates, but allow

# a wider range to avoid timing-related flakiness in slower environments.

call_count = mock_ingestor.flush_all.call_count

assert 1 <= call_count < 10, f"Expected 1-9 updates, got {call_count}"

Copilot · 2026-01-03T20:39:30Z

realtime_updater.py

+    # (H) Validate max_wait is greater than debounce when both are enabled
+    if debounce > 0 and max_wait > 0 and max_wait < debounce:
+        logger.warning(
+            f"max_wait ({max_wait}s) is less than debounce ({debounce}s). "
+            f"Setting max_wait to debounce value."
+        )
+        max_wait = debounce


When debounce is 0 but max_wait is positive, the validation logic doesn't handle this edge case. The condition checks "debounce > 0 and max_wait > 0" but if debounce is 0 (disabled), max_wait becomes meaningless and should probably be ignored or a warning issued. Consider adding a check for this scenario to avoid confusion.

Copilot · 2026-01-03T20:39:30Z

realtime_updater.py

+                timer = threading.Timer(
+                    self.debounce_seconds,
+                    self._process_debounced_change,
+                    args=[relative_path_str],
+                )
+                self.timers[relative_path_str] = timer
+                timer.start()


The Timer objects created here are not daemon threads. If the main program exits (e.g., via Ctrl+C), these non-daemon timer threads could prevent clean shutdown or cause the program to hang until all timers complete. Consider setting the timer threads as daemon threads by accessing the underlying thread object, or ensure proper cleanup of all pending timers in the KeyboardInterrupt handler.

Copilot · 2026-01-03T20:39:30Z

realtime_updater.py

+        timer = threading.Timer(
+            0, self._process_debounced_change, args=[relative_path_str]
+        )
+        self.timers[relative_path_str] = timer
+        timer.start()


Same issue here - the Timer thread is not set as daemon. This timer is used for immediate processing when max_wait is exceeded, and if it's running during program shutdown, it could prevent clean exit.

Copilot · 2026-01-03T20:39:30Z

realtime_updater.py

+                # (H) Schedule debounced processing
+                remaining_wait = self.max_wait_seconds - time_since_first
+                timer = threading.Timer(
+                    self.debounce_seconds,
+                    self._process_debounced_change,
+                    args=[relative_path_str],
+                )
+                self.timers[relative_path_str] = timer
+                timer.start()


The remaining_wait variable is calculated but only used for informational logging. While this is fine, the timer is always set to debounce_seconds regardless of how much time remains until max_wait is exceeded. This means if there's only 1 second left until max_wait but debounce is 5 seconds, the timer will be set for 5 seconds, potentially delaying the processing beyond max_wait. Consider using min(debounce_seconds, remaining_wait) as the timer duration to ensure max_wait is respected even when resetting the debounce timer.

…ore build artifacts - Add Rust: target, .fingerprint, incremental - Add AI/agent tools: .wagents, .codex, .opencode, .sisyphus, etc. - Add Node: .npm, .yarn, .pnpm-store - Add Python: .tox, .nox, .coverage, htmlcov - Add more file suffixes: .bak, .swp, .pyc, .pyo - Improve organization with section comments

- Add DEBOUNCE_MAX_WAIT_ADJUSTED constant to logs.py - Add INVALID_NON_NEGATIVE_FLOAT constant to tool_errors.py - Use constants instead of hardcoded strings in realtime_updater.py - Add 'from __future__ import annotations' to test file - Remove quotes from forward reference (now using PEP 563) Addresses feedback from Greptile, Gemini, and Copilot reviewers.

Copilot AI review requested due to automatic review settings January 3, 2026 20:35

github-project-automation bot added this to @vitali87's graph code Jan 3, 2026

Copilot started reviewing on behalf of Jrakru January 3, 2026 20:35 View session

gemini-code-assist bot reviewed Jan 3, 2026

View reviewed changes

greptile-apps bot reviewed Jan 3, 2026

View reviewed changes

Copilot AI reviewed Jan 3, 2026

View reviewed changes

Jrakru force-pushed the feature/realtime-debounce branch from a511859 to 0177e1f Compare January 3, 2026 21:46

Uh oh!

feat: add debouncing to realtime file watcher #213

Are you sure you want to change the base?

feat: add debouncing to realtime file watcher #213

Uh oh!

Conversation

Jrakru commented Jan 3, 2026

Summary

Problem

Solution

Changes

Usage

Performance Impact

Testing

Backward Compatibility

Uh oh!

gemini-code-assist bot commented Jan 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 3, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (3)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

greptile-apps bot left a comment •

edited

Loading