Improve performance of status on windows#2547
Improve performance of status on windows#2547Special Bread (special-bread) wants to merge 2 commits intoGitoxideLabs:mainfrom
Conversation
This creates a cache of file metadata that is then prefilled by windows to allow per-directory walking instead of per file. As a result performance is much faster. The cache method is made to minimise the surface area of the change, and is also windows-only where other targets should be unaffected. Testing status on the linux repo improves speed from ~1000ms to ~300ms - putting this to be roughly on par with libgit2. Faster speeds are possible but would require larger changes, so this is an initial pass while avoiding doing too much.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c27c3aebe0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
| pub fn normalize_path(path: &[u8]) -> BString { | ||
| use bstr::ByteSlice; | ||
| path.to_str_lossy().to_lowercase().into() |
There was a problem hiding this comment.
Respect case-sensitive worktrees in metadata keys
Do not lowercase every cache key unconditionally here: on Windows repositories living in case-sensitive directories (e.g. per-directory case sensitivity enabled, typically with core.ignoreCase=false), distinct tracked paths like Foo.txt and foo.txt collapse to the same key and one entry overwrites the other. index_as_worktree then reads the wrong cached stat for at least one file, which can misreport tracked-file status (clean/modified/removed) instead of merely causing a cache miss.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Shameful display on my part, I believe this may be left over from a different solution. There are various combinations of the folder on windows being case sensitive/not and there being collisions with the entries in the git tree/not, so either there is some mapping on what to do for each of the cases, or the cache should be case sensitive where any miss results in a miss, which slows down the status a tiny bit, but also maintains correctness. I believe that the worst case is if every path has incorrect casing between tree and disk, which would be strange and in practice I found it to be only a handful of files across multiple repos, with the typical case being zero. In the worst case this should revert back to the original performance, so not being slower than it used to be.
Ill make a commit that addresses this here
Follow up to git status performance improvement, this fixes an edge case where a case sensitive entry in the cache gets lowercased and matches a second case sensitive entry in the tree, potentially resulting in incorrect git status entries. Skipping lowercasing entirely results in those cases being a cache miss instead making it more transparent.
This creates a cache of file metadata that is then prefilled by windows API calls to allow per-directory walking instead of per file. As a result performance is much faster.
The cache method is made to minimise the surface area of the change, and is also windows-only where other targets should be unaffected.
Testing status on the linux repo improves speed from ~1000ms to ~300ms - putting this to be roughly on par with libgit2. Faster speeds are possible but would require larger changes, so this is an initial pass while avoiding doing too much.
Additional things to consider and discuss perhaps:
Given that this is a common piece of functionality I would love for someone else to test this too, I myself have been embarrassingly busy recently so this PR cooked for a while, and I may have missed some stuff while working on it on and off.