[December] The cargo integration has begun!
#694
Byron
announced in
Progress Update
Replies: 1 comment 4 replies
-
|
What the update for gitoxide integration with cargo? |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This month is all about getting started with integrating
gitoxideintocargo, and as you can imagine there is a lot of small (and bigger) things that are needed ingitoxideto make this happen without anybody noticing any difference except that everything is faster and better.Here come the details.
The Cargo Integration has begun
On the first of December the official integration PR was created with the goal of doing all fetches using
gitoxide. For that to work, plenty of improvements were required.Prodash with IDs
gitoxideusesprodashfor hierarchical progress reporting. Eachprogresscan have child-progress to correctly visualize complex tasks with a lot going on concurrently or in lock-step, as a tree of progress.cargoneeds to show progress using its own progress bars and providing only select output, like how many compressed bytes are received and how many objects are included.To enable this,
prodashhad to learn the concept of an ID which is a stable way to identify progress information within the progress tree, andgitoxidenow provides IDs for all progress it creates.cargoalso had to changes little as it was built around getting callbacks for progress, and now it's observing progress from a separate thread. This has the advantage that progress reporting can be as fast as one can increment an atomic counter, while progress reporting can happen at its own pace.Various fixes
With
cargohaving it's own and quite exhaustive test suite it showed plenty of shortcomings in the waygitoxidewas implemented at the time. All it took were about 195 git-specific tests, with a few of them running into errors that needed fixing ongitoxide's side.Reuse connections of HTTP transport
A test against a stubbed HTTP server showed that
gitoxidewould not keep the connection alive despite this being the default for HTTP 1.1, causing the test to fail.The fix was quite simple and just an oversight in the implementation, so I am really happy there was a test catching it. And of course, a similar test was added to the
gitoxidetest suite to prevent regression.No UTF8 for ls-refs line parsing
When scouring through the
git-protocolcode it became evident that a token of tech-debt was still left: references would be forced into UTF-8 Strings before parsing them (as bytes) due to using the standardlines()iterator of a buffered reader. This is a problem as technically, UTF-8 isn't a requirement for git references at all so the conversion can fail. Further, the bytes to string conversion is another copy and allocation that is unnecessary.With the introduction of a new trait it was possible to expose line-by-line reading that provides buffer-backed bytes instead to fix all prior problems in one go.
git-urlimproved handling of windows pathsgit-urlas a crate which makes the various kinds of allowed URLs accessible was quite stable recently, butcargodid manage to throw some new URLs at it that caused it to choke. Fixes were trivial, fortunately, but it remains to be seen what other URLs are able to go beyond any expectation.Improved discovery of non-typical repositories
Cargo does it's best to be resilient against breaking git repositories and should never choke on corrupt or unexpected data coming from disk. And of course, there is a test for that which caused the repository-local configuration file to go missing.
git2was fine with that,gitwas fine with that, butgitoxidereally thought aconfigfile has to be there.Now
gitoxidewill assume default values which seems to be equivalent to whatgitdoes as well.auto-tags
When implementing
fetchI intentionally skipped the 'auto-tag' feature to save some time in that moment, thinking that something this 'obscure' probably isn't required from day one. However, it turns out that 'auto-tags' aren't obscure at all and implementing them is needed for correct fetches and clones.auto-tagsas a capability need to be enabled by the client so the server can automatically send along all tag objects that point to any commit it would normally be sending. That way, the client will have all objects necessary to automatically create tags that point to any object it received. This is a somewhat hidden feature when doinggit cloneas git won't inform about the tags it createdThe
cargotest, however, was performing a fetch with ref-specs that only refer to branches, but assumed that tag references would also be present automatically.Now
gitoxidebehaves like it should and properly keeps track of implicit ref-specs like these, respecting typicalgitdefaults out of the box.Another one of these eurekas: environment-to-config-mapping
It still amazes me that this far into the project, it's still possible to make changes that make it seem that a big piece of the puzzle finally fell into place. Previously, environment variables, despite already being gated behind a permission flag, could possibly be read lazily in various places of the
git-repositorycase. For a library, that always troubled me as this effectively adds global state to the equation, littered all of the codebase. It never felt quite right.However, since
git-configrising to incredible importance withingitoxidethere also was an advent ofgitoxide.prefixed configuration variables for configuration ofgitoxidespecific values. Value being there are in the right place, the in that moment I connected the dots.Why not have a single step which transforms environment variables, based on permissions, into respective configuration values? That way there is one place to rule all the access to environment variables, and it's done exactly once when the
Repositoryis opened.Now all code needing these values could just read it from the git configuration, which also makes these values accessible to API users who want to override them, now without having to set environment variables in their process at all.
giximprovementsDue to the changes to integrate with
gitoxide, a few improvements also landed ingixto make them available to a broader audience and to additional testing on actual repositories.odb statsHaving said the above, that everything was about
cargo, it's probably fair to start out with an exception. With the publication ofnoseyparker, I couldn't resist to throwgitoxideat it to see how much faster it can go. It turned out, quite a lot!However, there was one feature missing to completely remove
git2from there, which was related to quickly enumerating all objects to know the bounds of the pending scan.For that,
repo.objects.header()was added to only decode object headers, as fast as possible, without decoding the whole objects. In average, doing that as opposed to a full decode is about 8 times faster, which is also faster than whatgit2can offer.With it, and full parallelism, an M1 Pro with 8.5 cores can calculate statistics on all objects within the linux kernel repo in 1.4s, or 6.7 million objects per second. This is too fast for any delta-caches by the way, and no matter what, my cache implementations that work fine when decoding objects wouldn't be able to speed up this computation at all (that is, the cost of the cache outweighed its benefits in all tested workloads) - interesting.
With it, there was also an option to change the iteration of all objects within an object database to be in pack order, which allows to use pack-delta caches much better when the entire object is decoded later.
All in all, I am happy
noseyparkermotivated me to finally implement the last missing feature in the object database.--strictand--no-verboseFor completeness and control, the
--strictflag was added to ensure that commands that are 'lenient' on the git configuration by default are forced to be strict. Being strict is the same way thatgittreats configuration as it will fail to perform any action if it contains malformed or invalid values.gitoxideimplements a degree of leniency to allow commands likegix configto always display configuration despite the presence of malformed values.With
gix fetchandgix clonebeing actually useful, it was important to have a way to automatically be verbose to show progress information. To provide more control over it, there is now a--no-verboseflag to turn off progress reporting in case of sub-commands that turn it on by default.clonewith auto-tagsFinally,
gix cloneperforms clones that have a similar result as ifgit clonewould have been used, due to the implementation ofauto-tags. To counter that and allow no tags to be cloned to keep the repo more minimal, one can now specify--no-tagsas well.Community
Worktree configuration support
Sidney contributed the ability to layer worktree-specific configuration on top of the one coming from the shared repository, providing a path to supporting sparse checkouts correctly. Thanks to his work,
gitoxidewill be supporting sparse checkouts by default as it's aware of all features of theindexfile from the very start.A fix for relative SCP URLs
Thanks to a user who employs
gitoxideto clone and fetch repositories a fix was provided to deal with SCP-like git urls with relative paths correctly.Various impactful fixes to
git-datecrateThanks to the repeated engagement of one industrious individual
git-datereceived a lot of impactful bugfixes. First to mention is a fix for incorrect handling of timezone offsets which made roundtrips impossible. Baseline tests now compare to the results in git more thoroughly as well which should prevent regressions and help future development.Lastly, due to a method being entirely untested and due to being dependent on the system's local timezone it was never run in a place where the timezone offset would become negative. And exactly that wasn't possible, until now.
Thanks for all your work!
Rust Foundation sponsorship: cargo shallow clones update
The last month on the grant to integrate
gitoxideinto cargo for shallow clones has begun and my plan is to make the PR reviewable this year 🎉. Doing mere fetches withgitoxidealready provides substantial performance improvements particularly on Windows, so it's worth having even before shallow clone support lands.It will definitely take more time to finish the integration and set
cargoup to phase outgit2as well.Merry Christmas, and a happy new year,
Sebastian
PS: The latest timesheets can be found here.
Beta Was this translation helpful? Give feedback.
All reactions