-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: auto-exclude large non-code files from checkpoints #7228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.
Thanks @adamhill! Great point about leveraging the existing git-lfs filter. I'll update the implementation to pre-filter files using the git-lfs extensions before scanning, which should significantly improve performance by avoiding unnecessary file system operations on already-ignored files. Working on this now. |
- Modified getLargeFileAutoExcludePatterns to accept git-lfs patterns as parameter - Use ripgrep exclusion flags to pre-filter git-lfs managed files before size checking - This avoids unnecessary file system operations on already-ignored files - Added comprehensive unit tests for the optimization As suggested by @adamhill, this leverages the existing git-lfs filter to improve performance
Hi @adamhill! I've implemented your suggestion. Here's what I did: Changes Made:
Performance Improvement:
Testing:
The optimization leverages the existing git-lfs filter exactly as you suggested, preventing redundant file system operations on already-ignored files. CI checks are currently running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! I've reviewed the changes and have some suggestions for improvement.
…TS_LARGE_FILE_THRESHOLD_MB; improve error reporting for auto-exclude (ripgrep/fs.stat); log diagnostics in ShadowCheckpointService; add tests
Addressed review comments. Summary of changes:
Ready for re-review. |
…d add unit test (#7228)
- Modified ShadowCheckpointService to check if exclude file exists before regenerating - Avoids expensive file scanning (1-3 seconds) on every checkpoint initialization - Only runs the scan on initial shadow repo creation or when exclude file is missing - Added refreshExcludePatterns() method for manual refresh if needed - Maintains backwards compatibility while fixing performance regression
…etup - Force write exclude file on initial creation since git creates a default one - Fix test to delete exclude file before re-initialization to test LFS patterns - Add verification that exclude patterns are properly applied
Related Documentation No published documentation to review for changes on this repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements automatic exclusion of large non-code files from checkpoints to prevent disk bloat while maintaining code file checkpointing. The feature dynamically scans for large files, respects Git LFS patterns, and provides configurable thresholds with detailed logging.
- Adds dynamic large file scanning with configurable size thresholds (default 10MB)
- Implements game engine asset pattern exclusions (Unity, Unreal)
- Introduces logging for excluded files with error tracking and diagnostics
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
src/services/checkpoints/excludes.ts | Adds core functionality for large file detection, LFS pre-filtering, and configurable thresholds |
src/services/checkpoints/ShadowCheckpointService.ts | Integrates new exclude patterns with stats and adds logging for excluded files |
src/services/checkpoints/tests/excludes.spec.ts | Comprehensive tests for new large file exclusion functionality |
src/services/checkpoints/tests/ShadowCheckpointService.spec.ts | Updates tests to handle exclude file regeneration behavior |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
for (const item of items) { | ||
if ((item as any).type !== "file") continue | ||
|
||
const rel = (item as any).path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type assertion (item as any)
bypasses TypeScript's type safety. Consider defining a proper interface for the ripgrep result items or using type guards to safely access the type
property.
const rel = (item as any).path | |
if (item.type !== "file") continue | |
const rel = item.path |
Copilot uses AI. Check for mistakes.
for (const item of items) { | ||
if ((item as any).type !== "file") continue | ||
|
||
const rel = (item as any).path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another unsafe type assertion (item as any)
. The same interface or type guard solution would improve type safety here as well.
const rel = (item as any).path | |
if (item.type !== "file") continue | |
const rel = item.path |
Copilot uses AI. Check for mistakes.
let fsStatErrors = 0 | ||
|
||
try { | ||
const rgResult = await executeRipgrep({ args, workspacePath, limit: 50000 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded limit of 50000 should be extracted as a named constant or made configurable. This magic number makes the code harder to maintain and understand.
const rgResult = await executeRipgrep({ args, workspacePath, limit: 50000 }) | |
const rgResult = await executeRipgrep({ args, workspacePath, limit: getConfiguredRipgrepFileLimit() }) |
Copilot uses AI. Check for mistakes.
|
||
const mb = Math.round(stats.thresholdBytes / (1024 * 1024)) | ||
|
||
if (stats?.largeFilesExcluded && stats.largeFilesExcluded > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The condition can be simplified to if (stats.largeFilesExcluded > 0)
since the optional chaining already handles the undefined case and numbers greater than 0 are truthy.
if (stats?.largeFilesExcluded && stats.largeFilesExcluded > 0) { | |
if (stats.largeFilesExcluded > 0) { |
Copilot uses AI. Check for mistakes.
Automatically exclude large non-code files and common game engine assets from checkpoints via dynamic scanning, and log the number of excluded files with a sample. This prevents disk bloat from large assets (e.g., Unity/Unreal, media, archives) while keeping code files checkpointed. Touched files: src/services/checkpoints/excludes.ts, src/services/checkpoints/ShadowCheckpointService.ts
Important
Automatically exclude large non-code files from checkpoints, with logging and configurable size threshold.
getExcludePatternsWithStats
inexcludes.ts
.ShadowCheckpointService.ts
.ROO_CHECKPOINTS_LARGE_FILE_THRESHOLD_MB
to configure file size threshold.getExcludePatternsWithStats
inexcludes.ts
returns patterns and stats for excluded files.getLargeFileAutoExcludePatterns
inexcludes.ts
scans for large files, excluding them based on size and type.ShadowCheckpointService.spec.ts
to verify exclude file creation and behavior.excludes.spec.ts
to verify pattern generation and error handling.This description was created by
for a52dad9. You can customize this summary. It will automatically update as commits are pushed.