feat: auto-exclude large non-code files from checkpoints #7228

roomote · 2025-08-19T19:01:49Z

Automatically exclude large non-code files and common game engine assets from checkpoints via dynamic scanning, and log the number of excluded files with a sample. This prevents disk bloat from large assets (e.g., Unity/Unreal, media, archives) while keeping code files checkpointed. Touched files: src/services/checkpoints/excludes.ts, src/services/checkpoints/ShadowCheckpointService.ts

Important

Automatically exclude large non-code files from checkpoints, with logging and configurable size threshold.

Behavior:
- Automatically excludes large non-code files from checkpoints using getExcludePatternsWithStats in excludes.ts.
- Logs number of excluded files and a sample in ShadowCheckpointService.ts.
- Supports environment variable ROO_CHECKPOINTS_LARGE_FILE_THRESHOLD_MB to configure file size threshold.
Functions:
- getExcludePatternsWithStats in excludes.ts returns patterns and stats for excluded files.
- getLargeFileAutoExcludePatterns in excludes.ts scans for large files, excluding them based on size and type.
Tests:
- Added tests in ShadowCheckpointService.spec.ts to verify exclude file creation and behavior.
- Added tests in excludes.spec.ts to verify pattern generation and error handling.

^{This description was created by}^{for a52dad9. You can customize this summary. It will automatically update as commits are pushed.}

… and log stats

src/services/checkpoints/excludes.ts

roomote

Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.

src/services/checkpoints/excludes.ts

src/services/checkpoints/ShadowCheckpointService.ts

roomote · 2025-08-19T19:29:28Z

Thanks @adamhill! Great point about leveraging the existing git-lfs filter. I'll update the implementation to pre-filter files using the git-lfs extensions before scanning, which should significantly improve performance by avoiding unnecessary file system operations on already-ignored files. Working on this now.

@adamhill

- Modified getLargeFileAutoExcludePatterns to accept git-lfs patterns as parameter - Use ripgrep exclusion flags to pre-filter git-lfs managed files before size checking - This avoids unnecessary file system operations on already-ignored files - Added comprehensive unit tests for the optimization As suggested by @adamhill, this leverages the existing git-lfs filter to improve performance

roomote · 2025-08-19T19:38:20Z

Hi @adamhill! I've implemented your suggestion. Here's what I did:

Changes Made:

Modified getLargeFileAutoExcludePatterns to accept git-lfs patterns as a parameter
Pre-filter files using ripgrep exclusion flags (-g !pattern) to skip git-lfs managed files before checking their sizes
Updated getExcludePatternsWithStats to pass the git-lfs patterns to the large file scanner

Performance Improvement:

Files managed by git-lfs are now excluded at the ripgrep level, avoiding unnecessary fs.stat() calls
This significantly reduces I/O operations when scanning repositories with many git-lfs tracked files

Testing:

Added comprehensive unit tests to verify the optimization works correctly
Tests confirm that git-lfs patterns are properly converted to ripgrep exclusion flags
All existing tests continue to pass

The optimization leverages the existing git-lfs filter exactly as you suggested, preventing redundant file system operations on already-ignored files. CI checks are currently running.

hannesrudolph

Thank you for your contribution! I've reviewed the changes and have some suggestions for improvement.

src/services/checkpoints/excludes.ts

src/services/checkpoints/ShadowCheckpointService.ts

…TS_LARGE_FILE_THRESHOLD_MB; improve error reporting for auto-exclude (ripgrep/fs.stat); log diagnostics in ShadowCheckpointService; add tests

hannesrudolph · 2025-08-19T22:43:06Z

Addressed review comments. Summary of changes:

Configurable threshold for large-file auto-exclusion via ROO_CHECKPOINTS_LARGE_FILE_THRESHOLD_MB (default 10 MB). Applied to dynamic scan and stats. Files: src/services/checkpoints/excludes.ts
LFS pre-filter preserved and tested. Ripgrep excludes Git LFS patterns using "-g !pattern". Tests updated. File: src/services/checkpoints/tests/excludes.spec.ts
Error handling & diagnostics. Distinguish ripgrep failures vs fs.stat failures; return errorCounts; ShadowCheckpointService logs diagnostics when non-zero. Files: src/services/checkpoints/excludes.ts, src/services/checkpoints/ShadowCheckpointService.ts
Logging behavior: single summary only when exclusions occur; shows threshold and sample (up to 10).
Tests added/updated: threshold override via env, ripgrep failure path, fs.stat error counting. All tests passing.

Ready for re-review.

…d add unit test (#7228)

- Modified ShadowCheckpointService to check if exclude file exists before regenerating - Avoids expensive file scanning (1-3 seconds) on every checkpoint initialization - Only runs the scan on initial shadow repo creation or when exclude file is missing - Added refreshExcludePatterns() method for manual refresh if needed - Maintains backwards compatibility while fixing performance regression

…etup - Force write exclude file on initial creation since git creates a default one - Fix test to delete exclude file before re-initialization to test LFS patterns - Add verification that exclude patterns are properly applied

dosubot · 2025-08-20T00:10:06Z

Related Documentation

No published documentation to review for changes on this repository.
Write your first living document

^{How did I do? Any feedback?}

Copilot

Pull Request Overview

This PR implements automatic exclusion of large non-code files from checkpoints to prevent disk bloat while maintaining code file checkpointing. The feature dynamically scans for large files, respects Git LFS patterns, and provides configurable thresholds with detailed logging.

Adds dynamic large file scanning with configurable size thresholds (default 10MB)
Implements game engine asset pattern exclusions (Unity, Unreal)
Introduces logging for excluded files with error tracking and diagnostics

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
src/services/checkpoints/excludes.ts	Adds core functionality for large file detection, LFS pre-filtering, and configurable thresholds
src/services/checkpoints/ShadowCheckpointService.ts	Integrates new exclude patterns with stats and adds logging for excluded files
src/services/checkpoints/tests/excludes.spec.ts	Comprehensive tests for new large file exclusion functionality
src/services/checkpoints/tests/ShadowCheckpointService.spec.ts	Updates tests to handle exclude file regeneration behavior

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-20T00:10:52Z

src/services/checkpoints/excludes.ts

+	for (const item of items) {
+		if ((item as any).type !== "file") continue
+
+		const rel = (item as any).path


The type assertion (item as any) bypasses TypeScript's type safety. Consider defining a proper interface for the ripgrep result items or using type guards to safely access the type property.

Suggested change

const rel = (item as any).path

if (item.type !== "file") continue

const rel = item.path

Copilot · 2025-08-20T00:10:53Z

src/services/checkpoints/excludes.ts

+	for (const item of items) {
+		if ((item as any).type !== "file") continue
+
+		const rel = (item as any).path


Another unsafe type assertion (item as any). The same interface or type guard solution would improve type safety here as well.

Suggested change

const rel = (item as any).path

if (item.type !== "file") continue

const rel = item.path

Copilot · 2025-08-20T00:10:53Z

src/services/checkpoints/excludes.ts

+	let fsStatErrors = 0
+
+	try {
+		const rgResult = await executeRipgrep({ args, workspacePath, limit: 50000 })


The hardcoded limit of 50000 should be extracted as a named constant or made configurable. This magic number makes the code harder to maintain and understand.

Suggested change

const rgResult = await executeRipgrep({ args, workspacePath, limit: 50000 })

const rgResult = await executeRipgrep({ args, workspacePath, limit: getConfiguredRipgrepFileLimit() })

Copilot · 2025-08-20T00:10:53Z

src/services/checkpoints/ShadowCheckpointService.ts

+
+		const mb = Math.round(stats.thresholdBytes / (1024 * 1024))
+
+		if (stats?.largeFilesExcluded && stats.largeFilesExcluded > 0) {


[nitpick] The condition can be simplified to if (stats.largeFilesExcluded > 0) since the optional chaining already handles the undefined case and numbers greater than 0 are truthy.

Suggested change

if (stats?.largeFilesExcluded && stats.largeFilesExcluded > 0) {

if (stats.largeFilesExcluded > 0) {

feat(checkpoints): auto-exclude large non-code files from checkpoints…

24d887b

… and log stats

roomote bot requested review from mrubens, cte and jr as code owners August 19, 2025 19:01

github-project-automation bot added this to Roo Code Roadmap Aug 19, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Aug 19, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Aug 19, 2025

github-project-automation bot added this to Roo Code Roadmap Aug 19, 2025

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 19, 2025

ellipsis-dev bot reviewed Aug 19, 2025

View reviewed changes

src/services/checkpoints/excludes.ts Outdated Show resolved Hide resolved

dosubot bot added the enhancement New feature or request label Aug 19, 2025

roomote bot commented Aug 19, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 19, 2025

daniel-lxs assigned hannesrudolph Aug 19, 2025

daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Aug 19, 2025

daniel-lxs marked this pull request as draft August 19, 2025 19:25

hannesrudolph added PR - Draft / In Progress and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 19, 2025

hannesrudolph reviewed Aug 19, 2025

View reviewed changes

src/services/checkpoints/excludes.ts Show resolved Hide resolved

src/services/checkpoints/excludes.ts Show resolved Hide resolved

src/services/checkpoints/excludes.ts Outdated Show resolved Hide resolved

src/services/checkpoints/ShadowCheckpointService.ts Show resolved Hide resolved

checkpoints: make large-file threshold configurable via ROO_CHECKPOIN…

8ca0c46

…TS_LARGE_FILE_THRESHOLD_MB; improve error reporting for auto-exclude (ripgrep/fs.stat); log diagnostics in ShadowCheckpointService; add tests

hannesrudolph and others added 2 commits August 19, 2025 17:28

fix(checkpoints): correct Windows Thumbs cache pattern (Thumbs.db) an…

7280966

…d add unit test (#7228)

hannesrudolph moved this from PR [Draft / In Progress] to PR [Needs Prelim Review] in Roo Code Roadmap Aug 19, 2025

hannesrudolph moved this from PR [Needs Prelim Review] to PR [Draft / In Progress] in Roo Code Roadmap Aug 19, 2025

hannesrudolph moved this from PR [Draft / In Progress] to PR [Needs Prelim Review] in Roo Code Roadmap Aug 20, 2025

hannesrudolph marked this pull request as ready for review August 20, 2025 00:09

Copilot AI review requested due to automatic review settings August 20, 2025 00:09

Copilot AI reviewed Aug 20, 2025

View reviewed changes

hannesrudolph added PR - Needs Preliminary Review and removed PR - Draft / In Progress labels Aug 20, 2025

hannesrudolph closed this Aug 26, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 26, 2025

github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Aug 26, 2025

-		const rel = (item as any).path
+		if (item.type !== "file") continue
+		const rel = item.path

	const rgResult = await executeRipgrep({ args, workspacePath, limit: 50000 })
	const rgResult = await executeRipgrep({ args, workspacePath, limit: getConfiguredRipgrepFileLimit() })


		const mb = Math.round(stats.thresholdBytes / (1024 * 1024))

		if (stats?.largeFilesExcluded && stats.largeFilesExcluded > 0) {

	if (stats?.largeFilesExcluded && stats.largeFilesExcluded > 0) {
	if (stats.largeFilesExcluded > 0) {

feat: auto-exclude large non-code files from checkpoints #7228

feat: auto-exclude large non-code files from checkpoints #7228

Uh oh!

Conversation

roomote bot commented Aug 19, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

roomote bot commented Aug 19, 2025

Uh oh!

roomote bot commented Aug 19, 2025

Changes Made:

Performance Improvement:

Testing:

Uh oh!

hannesrudolph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hannesrudolph commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dosubot bot commented Aug 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

roomote bot commented Aug 19, 2025 •

edited by ellipsis-dev bot

Loading

hannesrudolph commented Aug 19, 2025 •

edited

Loading