Skip to content

Add 'git sparse-checkout clean' #779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 5, 2025

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Aug 5, 2025

This is a fast-track of gitgitgadget#1941, so please see that PR for the full context and details.

This is being prioritized as it solves a pain point for Office monorepo developers who get stuck with files outside of their sparse-checkout but no clear guidance as to how to solve the problem. With this change, users are advised to run git sparse-checkout clean as a heavy hammer to get into a better state. This will make their sparse index work as intended instead of slowing them down more than they should.

The upstream version got stuck on some minor details that may lead to an adjusted CLI. However, it got blocked on a dependent change due to globals refactoring. This is marked experimental for now.

The logic for the 'git sparse-checkout' builtin uses the_repository all
over the place, despite some use of a repository struct in different
method parameters. Complete this removal of the_repository by using
'repo' when possible.

In one place, there was already a local variable 'r' that was set to
the_repository, so move that to a method parameter.

We cannot remove the USE_THE_REPOSITORY_VARIABLE declaration as we are
still using global constants for the state of the sparse-checkout.

Signed-off-by: Derrick Stolee <[email protected]>
When users change their sparse-checkout definitions to add new
directories and remove old ones, there may be a few reasons why
directories no longer in scope remain (ignored or excluded files still
exist, Windows handles are still open, etc.). When these files still
exist, the sparse index feature notices that a tracked, but sparse,
directory still exists on disk and thus the index expands. This causes a
performance hit _and_ the advice printed isn't very helpful. Using 'git
clean' isn't enough (generally '-dfx' may be needed) but also this may
not be sufficient.

Add a new subcommand to 'git sparse-checkout' that removes these
tracked-but-sparse directories. This necessarily removes all files
contained within, including tracked and untracked files. Of particular
importance are ignored and excluded files which would normally be
ignored even by 'git clean -f' unless the '-x' or '-X' option is
provided. This is the most extreme method for doing this, but it works
when the sparse-checkout is in cone mode and is expected to rescope
based on directories, not files.

The current implementation always deletes these sparse directories
without warning. This is unacceptable for a released version, but those
features will be added in changes coming immediately after this one.

Note that untracked directories within the sparse-checkout remain.
Further, directories that contain staged changes or files in merge
conflict states are not deleted. This is a detail that is partly hidden
by the implementation which relies on collapsing the index to a sparse
index in-memory and only deleting directories that are listed as sparse
in the index.

If a staged change exists, then that entry is not stored as a sparse
tree entry and thus remains on-disk until committed or reset.

There are some interesting cases around merge conflict resolution, but
that will be carefully analyzed in the future.

Signed-off-by: Derrick Stolee <[email protected]>
The 'git sparse-checkout clean' subcommand is somewhat similar to 'git
clean' in that it will delete files that should not be in the worktree.
The big difference is that it focuses on the directories that should not
be in the worktree due to cone-mode sparse-checkout. It also does not
discriminate in the kinds of files and focuses on deleting entire
directories.

However, there are some restrictions that would be good to bring over
from 'git clean', specifically how it refuses to do anything without the
'-f'/'--force' or '-n'/'--dry-run' arguments. The 'clean.requireForce'
config can be set to 'false' to imply '--force'.

Add this behavior to avoid accidental deletion of files that cannot be
recovered from Git.

Signed-off-by: Derrick Stolee <[email protected]>
There is sometimes a need to visit every file within a directory,
recursively. The main example is remove_dir_recursively(), though it has
some extra flags that make it want to iterate over paths in a custom
way. There is also the fill_directory() approach but that involves an
index and a pathspec.

This change adds a new for_each_file_in_dir() method that will be
helpful in the next change.

Signed-off-by: Derrick Stolee <[email protected]>
The 'git sparse-checkout clean' subcommand is focused on directories,
deleting any tracked sparse directories to clean up the worktree and
make the sparse index feature work optimally.

However, this directory-focused approach can leave users wondering why
those directories exist at all. In my experience, these files are left
over due to ignore or exclude patterns, Windows file handles, or
possibly merge conflict resolutions.

Add a new '--verbose' option for users to see all the files that are
being deleted (with '--force') or would be deleted (with '--dry-run').

Signed-off-by: Derrick Stolee <[email protected]>
In my experience, the most-common reason that the sparse index must
expand to a full one is because there is some leftover file in a tracked
directory that is now outside of the sparse-checkout. The new 'git
sparse-checkout clean' command will find and delete these directories,
so point users to it when they hit the sparse index expansion advice.

Signed-off-by: Derrick Stolee <[email protected]>
With the current implementation of 'git sparse-checkout clean', we
notice that a file that was in a conflicted state does not get cleaned
up because of some internal details around the SKIP_WORKTREE bit.

This test is documenting the current behavior before we update it in the
following change.

Signed-off-by: Derrick Stolee <[email protected]>
The 'git sparse-checkout clean' command is designed to be a one-command
way to get the worktree in a state such that a sparse index would
operate efficiently. The previous change demonstrated that files outside
the sparse-checkout that were committed due to a merge conflict would
persist despite attempts to run 'git sparse-checkout clean' and instead
a 'git sparse-checkout reapply' would be required.

Instead of requiring users to run both commands, update 'clean' to be
more ruthless about tracked sparse directories. The key here is to make
sure that the SKIP_WORKTREE bit is removed from more paths in the index
using update_sparsity() before compressing the index to a sparse one
in-memory.

The tricky part here is that update_sparsity() was previously assuming
that it would be in 'update' mode and would change the worktree as it
made changes. However, we do not want to make these worktree changes at
this point, instead relying on our later logic (that integrates with
--dry-run and --verbose options) to perform those steps.

One side-effect here is that we also clear out staged files that exist
in the worktree, but they would also appear in the verbose output as
part of the dry run.

The final test in t1091 demonstrates that we no longer need the
'reapply' subcommand for merge resolutions. It also fixes an earlier
case where 'git add --sparse' clears the SKIP_WORKTREE bit and avoids a
directory deletion.

Signed-off-by: Derrick Stolee <[email protected]>
@derrickstolee derrickstolee self-assigned this Aug 5, 2025
Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@dscho dscho merged commit 6e437e9 into microsoft:vfs-2.50.1 Aug 5, 2025
120 of 122 checks passed
dscho added a commit that referenced this pull request Aug 5, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the
full context and details.

This is being prioritized as it solves a pain point for Office monorepo
developers who get stuck with files outside of their sparse-checkout but
no clear guidance as to how to solve the problem. With this change,
users are advised to run `git sparse-checkout clean` as a heavy hammer
to get into a better state. This will make their sparse index work as
intended instead of slowing them down more than they should.

The upstream version got stuck on some minor details that may lead to an
adjusted CLI. However, it got blocked on a dependent change due to
globals refactoring. This is marked experimental for now.
dscho added a commit that referenced this pull request Aug 5, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the
full context and details.

This is being prioritized as it solves a pain point for Office monorepo
developers who get stuck with files outside of their sparse-checkout but
no clear guidance as to how to solve the problem. With this change,
users are advised to run `git sparse-checkout clean` as a heavy hammer
to get into a better state. This will make their sparse index work as
intended instead of slowing them down more than they should.

The upstream version got stuck on some minor details that may lead to an
adjusted CLI. However, it got blocked on a dependent change due to
globals refactoring. This is marked experimental for now.
dscho added a commit that referenced this pull request Aug 8, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the
full context and details.

This is being prioritized as it solves a pain point for Office monorepo
developers who get stuck with files outside of their sparse-checkout but
no clear guidance as to how to solve the problem. With this change,
users are advised to run `git sparse-checkout clean` as a heavy hammer
to get into a better state. This will make their sparse index work as
intended instead of slowing them down more than they should.

The upstream version got stuck on some minor details that may lead to an
adjusted CLI. However, it got blocked on a dependent change due to
globals refactoring. This is marked experimental for now.
dscho added a commit that referenced this pull request Aug 8, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the
full context and details.

This is being prioritized as it solves a pain point for Office monorepo
developers who get stuck with files outside of their sparse-checkout but
no clear guidance as to how to solve the problem. With this change,
users are advised to run `git sparse-checkout clean` as a heavy hammer
to get into a better state. This will make their sparse index work as
intended instead of slowing them down more than they should.

The upstream version got stuck on some minor details that may lead to an
adjusted CLI. However, it got blocked on a dependent change due to
globals refactoring. This is marked experimental for now.
dscho added a commit that referenced this pull request Aug 13, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the
full context and details.

This is being prioritized as it solves a pain point for Office monorepo
developers who get stuck with files outside of their sparse-checkout but
no clear guidance as to how to solve the problem. With this change,
users are advised to run `git sparse-checkout clean` as a heavy hammer
to get into a better state. This will make their sparse index work as
intended instead of slowing them down more than they should.

The upstream version got stuck on some minor details that may lead to an
adjusted CLI. However, it got blocked on a dependent change due to
globals refactoring. This is marked experimental for now.
@dscho dscho mentioned this pull request Aug 13, 2025
dscho added a commit that referenced this pull request Aug 19, 2025
This is a fast-track of gitgitgadget#1941, so please see that PR for the
full context and details.

This is being prioritized as it solves a pain point for Office monorepo
developers who get stuck with files outside of their sparse-checkout but
no clear guidance as to how to solve the problem. With this change,
users are advised to run `git sparse-checkout clean` as a heavy hammer
to get into a better state. This will make their sparse index work as
intended instead of slowing them down more than they should.

The upstream version got stuck on some minor details that may lead to an
adjusted CLI. However, it got blocked on a dependent change due to
globals refactoring. This is marked experimental for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants