forked from git-for-windows/git
-
Notifications
You must be signed in to change notification settings - Fork 102
Add 'git sparse-checkout clean' #779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
dscho
merged 9 commits into
microsoft:vfs-2.50.1
from
derrickstolee:sparse-checkout-clean-ms
Aug 5, 2025
Merged
Add 'git sparse-checkout clean' #779
dscho
merged 9 commits into
microsoft:vfs-2.50.1
from
derrickstolee:sparse-checkout-clean-ms
Aug 5, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The logic for the 'git sparse-checkout' builtin uses the_repository all over the place, despite some use of a repository struct in different method parameters. Complete this removal of the_repository by using 'repo' when possible. In one place, there was already a local variable 'r' that was set to the_repository, so move that to a method parameter. We cannot remove the USE_THE_REPOSITORY_VARIABLE declaration as we are still using global constants for the state of the sparse-checkout. Signed-off-by: Derrick Stolee <[email protected]>
When users change their sparse-checkout definitions to add new directories and remove old ones, there may be a few reasons why directories no longer in scope remain (ignored or excluded files still exist, Windows handles are still open, etc.). When these files still exist, the sparse index feature notices that a tracked, but sparse, directory still exists on disk and thus the index expands. This causes a performance hit _and_ the advice printed isn't very helpful. Using 'git clean' isn't enough (generally '-dfx' may be needed) but also this may not be sufficient. Add a new subcommand to 'git sparse-checkout' that removes these tracked-but-sparse directories. This necessarily removes all files contained within, including tracked and untracked files. Of particular importance are ignored and excluded files which would normally be ignored even by 'git clean -f' unless the '-x' or '-X' option is provided. This is the most extreme method for doing this, but it works when the sparse-checkout is in cone mode and is expected to rescope based on directories, not files. The current implementation always deletes these sparse directories without warning. This is unacceptable for a released version, but those features will be added in changes coming immediately after this one. Note that untracked directories within the sparse-checkout remain. Further, directories that contain staged changes or files in merge conflict states are not deleted. This is a detail that is partly hidden by the implementation which relies on collapsing the index to a sparse index in-memory and only deleting directories that are listed as sparse in the index. If a staged change exists, then that entry is not stored as a sparse tree entry and thus remains on-disk until committed or reset. There are some interesting cases around merge conflict resolution, but that will be carefully analyzed in the future. Signed-off-by: Derrick Stolee <[email protected]>
The 'git sparse-checkout clean' subcommand is somewhat similar to 'git clean' in that it will delete files that should not be in the worktree. The big difference is that it focuses on the directories that should not be in the worktree due to cone-mode sparse-checkout. It also does not discriminate in the kinds of files and focuses on deleting entire directories. However, there are some restrictions that would be good to bring over from 'git clean', specifically how it refuses to do anything without the '-f'/'--force' or '-n'/'--dry-run' arguments. The 'clean.requireForce' config can be set to 'false' to imply '--force'. Add this behavior to avoid accidental deletion of files that cannot be recovered from Git. Signed-off-by: Derrick Stolee <[email protected]>
There is sometimes a need to visit every file within a directory, recursively. The main example is remove_dir_recursively(), though it has some extra flags that make it want to iterate over paths in a custom way. There is also the fill_directory() approach but that involves an index and a pathspec. This change adds a new for_each_file_in_dir() method that will be helpful in the next change. Signed-off-by: Derrick Stolee <[email protected]>
The 'git sparse-checkout clean' subcommand is focused on directories, deleting any tracked sparse directories to clean up the worktree and make the sparse index feature work optimally. However, this directory-focused approach can leave users wondering why those directories exist at all. In my experience, these files are left over due to ignore or exclude patterns, Windows file handles, or possibly merge conflict resolutions. Add a new '--verbose' option for users to see all the files that are being deleted (with '--force') or would be deleted (with '--dry-run'). Signed-off-by: Derrick Stolee <[email protected]>
In my experience, the most-common reason that the sparse index must expand to a full one is because there is some leftover file in a tracked directory that is now outside of the sparse-checkout. The new 'git sparse-checkout clean' command will find and delete these directories, so point users to it when they hit the sparse index expansion advice. Signed-off-by: Derrick Stolee <[email protected]>
With the current implementation of 'git sparse-checkout clean', we notice that a file that was in a conflicted state does not get cleaned up because of some internal details around the SKIP_WORKTREE bit. This test is documenting the current behavior before we update it in the following change. Signed-off-by: Derrick Stolee <[email protected]>
The 'git sparse-checkout clean' command is designed to be a one-command way to get the worktree in a state such that a sparse index would operate efficiently. The previous change demonstrated that files outside the sparse-checkout that were committed due to a merge conflict would persist despite attempts to run 'git sparse-checkout clean' and instead a 'git sparse-checkout reapply' would be required. Instead of requiring users to run both commands, update 'clean' to be more ruthless about tracked sparse directories. The key here is to make sure that the SKIP_WORKTREE bit is removed from more paths in the index using update_sparsity() before compressing the index to a sparse one in-memory. The tricky part here is that update_sparsity() was previously assuming that it would be in 'update' mode and would change the worktree as it made changes. However, we do not want to make these worktree changes at this point, instead relying on our later logic (that integrates with --dry-run and --verbose options) to perform those steps. One side-effect here is that we also clear out staged files that exist in the worktree, but they would also appear in the verbose output as part of the dry run. The final test in t1091 demonstrates that we no longer need the 'reapply' subcommand for merge resolutions. It also fixes an earlier case where 'git add --sparse' clears the SKIP_WORKTREE bit and avoids a directory deletion. Signed-off-by: Derrick Stolee <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
dscho
approved these changes
Aug 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
dscho
added a commit
that referenced
this pull request
Aug 5, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the full context and details. This is being prioritized as it solves a pain point for Office monorepo developers who get stuck with files outside of their sparse-checkout but no clear guidance as to how to solve the problem. With this change, users are advised to run `git sparse-checkout clean` as a heavy hammer to get into a better state. This will make their sparse index work as intended instead of slowing them down more than they should. The upstream version got stuck on some minor details that may lead to an adjusted CLI. However, it got blocked on a dependent change due to globals refactoring. This is marked experimental for now.
dscho
added a commit
that referenced
this pull request
Aug 5, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the full context and details. This is being prioritized as it solves a pain point for Office monorepo developers who get stuck with files outside of their sparse-checkout but no clear guidance as to how to solve the problem. With this change, users are advised to run `git sparse-checkout clean` as a heavy hammer to get into a better state. This will make their sparse index work as intended instead of slowing them down more than they should. The upstream version got stuck on some minor details that may lead to an adjusted CLI. However, it got blocked on a dependent change due to globals refactoring. This is marked experimental for now.
dscho
added a commit
that referenced
this pull request
Aug 8, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the full context and details. This is being prioritized as it solves a pain point for Office monorepo developers who get stuck with files outside of their sparse-checkout but no clear guidance as to how to solve the problem. With this change, users are advised to run `git sparse-checkout clean` as a heavy hammer to get into a better state. This will make their sparse index work as intended instead of slowing them down more than they should. The upstream version got stuck on some minor details that may lead to an adjusted CLI. However, it got blocked on a dependent change due to globals refactoring. This is marked experimental for now.
dscho
added a commit
that referenced
this pull request
Aug 8, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the full context and details. This is being prioritized as it solves a pain point for Office monorepo developers who get stuck with files outside of their sparse-checkout but no clear guidance as to how to solve the problem. With this change, users are advised to run `git sparse-checkout clean` as a heavy hammer to get into a better state. This will make their sparse index work as intended instead of slowing them down more than they should. The upstream version got stuck on some minor details that may lead to an adjusted CLI. However, it got blocked on a dependent change due to globals refactoring. This is marked experimental for now.
dscho
added a commit
that referenced
this pull request
Aug 13, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the full context and details. This is being prioritized as it solves a pain point for Office monorepo developers who get stuck with files outside of their sparse-checkout but no clear guidance as to how to solve the problem. With this change, users are advised to run `git sparse-checkout clean` as a heavy hammer to get into a better state. This will make their sparse index work as intended instead of slowing them down more than they should. The upstream version got stuck on some minor details that may lead to an adjusted CLI. However, it got blocked on a dependent change due to globals refactoring. This is marked experimental for now.
Merged
dscho
added a commit
that referenced
this pull request
Aug 19, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the full context and details. This is being prioritized as it solves a pain point for Office monorepo developers who get stuck with files outside of their sparse-checkout but no clear guidance as to how to solve the problem. With this change, users are advised to run `git sparse-checkout clean` as a heavy hammer to get into a better state. This will make their sparse index work as intended instead of slowing them down more than they should. The upstream version got stuck on some minor details that may lead to an adjusted CLI. However, it got blocked on a dependent change due to globals refactoring. This is marked experimental for now.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a fast-track of gitgitgadget#1941, so please see that PR for the full context and details.
This is being prioritized as it solves a pain point for Office monorepo developers who get stuck with files outside of their sparse-checkout but no clear guidance as to how to solve the problem. With this change, users are advised to run
git sparse-checkout clean
as a heavy hammer to get into a better state. This will make their sparse index work as intended instead of slowing them down more than they should.The upstream version got stuck on some minor details that may lead to an adjusted CLI. However, it got blocked on a dependent change due to globals refactoring. This is marked experimental for now.