Skip to content

Improve performance of status on windows#2547

Open
Special Bread (special-bread) wants to merge 2 commits intoGitoxideLabs:mainfrom
special-bread:windows-status-performance
Open

Improve performance of status on windows#2547
Special Bread (special-bread) wants to merge 2 commits intoGitoxideLabs:mainfrom
special-bread:windows-status-performance

Conversation

@special-bread
Copy link
Copy Markdown

@special-bread Special Bread (special-bread) commented Apr 27, 2026

This creates a cache of file metadata that is then prefilled by windows API calls to allow per-directory walking instead of per file. As a result performance is much faster.

The cache method is made to minimise the surface area of the change, and is also windows-only where other targets should be unaffected.

Testing status on the linux repo improves speed from ~1000ms to ~300ms - putting this to be roughly on par with libgit2. Faster speeds are possible but would require larger changes, so this is an initial pass while avoiding doing too much.


Additional things to consider and discuss perhaps:

  1. This does have a little drift I feel, the cache works but perhaps it should not be considered a cache since its thrown away after every git status, and often invalidating these is equivalent to rebuilding these. So using a cache like an actual cache over multiple git statuses is up to the caller, and its pretty complex so the caller would need to know a lot to be able to use this, also for dubious benefit.
  2. I did leave some room open for linux based speedups later, but I believe that a different implementation would be needed as lstat on linux is fast and a cache wouldnt really speed things up, the only option here is to instead include a directory keyed cache which would be able to check for untracked files, and meaning that you can do fewer lstats overall, but that would be a perhaps 10-20% speedup, not a 300% (with 1000% possible) speedup like on windows.
  3. for reference check out this custom implementation of git status I have here: https://github.com/special-bread/tests-git-status - this can do a git status of linux (the above test case) in ~70ms, but is redone entirely, and also has some slightly different behaviour which is fine for my purposes but not identical to git - i.e. how it considers case sensitivity, how it treats some states as clean if git index entries cancel out, and some other details. I think that its possible to reach and beat the time in there, but that would require more invasive changes which I thought would be fairly rough for a PR that touches a piece of core functionality.
  4. see related issue here: "gix status" is slow on Windows #2296

Given that this is a common piece of functionality I would love for someone else to test this too, I myself have been embarrassingly busy recently so this PR cooked for a while, and I may have missed some stuff while working on it on and off.

This creates a cache of file metadata that is then prefilled by windows to allow per-directory walking instead of per file. As a result performance is much faster.

The cache method is made to minimise the surface area of the change, and is also windows-only where other targets should be unaffected.

Testing status on the linux repo improves speed from ~1000ms to ~300ms - putting this to be roughly on par with libgit2. Faster speeds are possible but would require larger changes, so this is an initial pass while avoiding doing too much.
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c27c3aebe0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

Comment thread gix-status/src/metadata_cache.rs Outdated
Comment on lines +94 to +96
pub fn normalize_path(path: &[u8]) -> BString {
use bstr::ByteSlice;
path.to_str_lossy().to_lowercase().into()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect case-sensitive worktrees in metadata keys

Do not lowercase every cache key unconditionally here: on Windows repositories living in case-sensitive directories (e.g. per-directory case sensitivity enabled, typically with core.ignoreCase=false), distinct tracked paths like Foo.txt and foo.txt collapse to the same key and one entry overwrites the other. index_as_worktree then reads the wrong cached stat for at least one file, which can misreport tracked-file status (clean/modified/removed) instead of merely causing a cache miss.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shameful display on my part, I believe this may be left over from a different solution. There are various combinations of the folder on windows being case sensitive/not and there being collisions with the entries in the git tree/not, so either there is some mapping on what to do for each of the cases, or the cache should be case sensitive where any miss results in a miss, which slows down the status a tiny bit, but also maintains correctness. I believe that the worst case is if every path has incorrect casing between tree and disk, which would be strange and in practice I found it to be only a handful of files across multiple repos, with the typical case being zero. In the worst case this should revert back to the original performance, so not being slower than it used to be.

Ill make a commit that addresses this here

Follow up to git status performance improvement, this fixes an edge case where a case sensitive entry in the cache gets lowercased and matches a second case sensitive entry in the tree, potentially resulting in incorrect git status entries. Skipping lowercasing entirely results in those cases being a cache miss instead making it more transparent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant