-
Notifications
You must be signed in to change notification settings - Fork 2.8k
perf(clean): Optimize (legacy) clean with multiple -p specifiers #16264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Have you tried out the performance of |
|
Are we really benefiting from all of this filesystem globbing? Should we just walk all of the content upfront, get it into a |
No, I did not try
I don't think we benefit much from globbing, particularly because most of it's use cases are about a simple prefix/suffix matching. I'm in the process of coalescing file walking further. I'll push that up to this branch before marking this PR as ready to review.
If we could make the pattern matching part straightforward (to read/maintain) then sure, that'd be best. I'm clinging onto the existing setup for no good reason; the end goal for me is to have a faster |
|
I am very happy to report that This PR is currently at 2.6s. I think we could reach a similar ballpark with the current build dir layout.. But obviously, this code will be removed at some point either way. |
This comment has been minimized.
This comment has been minimized.
d4a8b8d to
0afdf97
Compare
### What does this PR try to resolve? A follow up to #16300. This doesn't split out any functions - The stuff before hand is quite significant - For #16264, we made add more before and some after - We could pull out only the branches of the `if` into functions but then we'd just move them back when this is done and seems like extra churn ### How to test and review this PR?
This comment has been minimized.
This comment has been minimized.
|
I've rebased the existing implementation on top of #16304 - while we could push the perf further, I think this is good enough (after all, it gets us from 70s down to 3s) for our use case. |
|
r? epage |
In my mind, we would do file walking by let mut cache: HashMap<PathBuf, Vec<OsString>> = Default::default();
let mut to_remove: Vec<PathBuf> = Default::default();
for ... {
//...
let file_names = cache.entry(dir).or_insert_with(|| std::fs::read_dir(dir));
file_names.retain(|name| {
let remove name.starts_with(prefix) && name.ends_with(suffix);
if remove {
to_remove.push(dir.join(name));
}
!remove
});
}
for path in to_remove {
self.rm_rf(path)?;
}(psuedo code, no preference on the exact details including whether Benefits
|
src/cargo/ops/cargo_clean.rs
Outdated
| { | ||
| let paths = [ | ||
| dirs_to_clean.mark_utf(dir, |path| { | ||
| if path.starts_with(&path_dash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we do the rsplit_once check earlier but not here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier call sites were using rm_rf_package_glob_containing_dash, whereas this one used to call rm_rf_prefix_list. We could probably use strip_prefix(crate_name) here followed by a dispatch on the following character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, lets leave it for now. In a follow up, we should probably extract a starts_with_pkg_name(name, sep) so we consistently rsplit_once.
f6cf214 to
43b04a2
Compare
This commit optimizes implementation of `cargo clean -p` by reducing the amount of directory walks that take place. We now walk each directory at most once and add to the list of files to be cleaned, step by step. In practice this helps us significantly reduce the runtime for clearing large workspaces (as implemented in rust-lang#16263); for Zed, `cargo clean --workspace` went down from 73 seconds to 3 seconds. We have 216 workspace members. Co-authored-by: dino <[email protected]> Co-authored-by: Ed Page <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Update cargo submodule 27 commits in 2c283a9a5c5968eeb9a8f12313f04feb1ff8dfac..c46423de7351e3c4c734b2faa86088a9f5d1302b 2025-12-04 16:47:28 +0000 to 2025-12-12 23:16:12 +0000 - feat(report): cargo report timings HTML replay (rust-lang/cargo#16377) - feat: stabilize `-Zconfig-include` (rust-lang/cargo#16284) - fix(package): Don't verify registry for --list (rust-lang/cargo#16341) - Fixed incorrect locking logic when artifact-dir == build-dir (rust-lang/cargo#16385) - feat(log): make timing messages ready for HTML replay (rust-lang/cargo#16382) - chore(deps): update msrv (1 version) to v1.92 (rust-lang/cargo#16381) - Downgrade curl-sys to 0.4.83 (rust-lang/cargo#16379) - fix(timing): more self-contained timing/log data (rust-lang/cargo#16378) - test: update to `proc_macro::tracked::path` (rust-lang/cargo#16380) - refactor(lint): move lints to separate modules (rust-lang/cargo#16364) - fix(index): Apply feedback from Cargo team (rust-lang/cargo#16369) - fix(lints): handle lints separately at ws pkg level (rust-lang/cargo#16367) - feat(lint): new `implicit_minimum_version_req` lint (rust-lang/cargo#16321) - fix(info): default to local without explicit reg (rust-lang/cargo#16358) - Remove `[no-mentions]` handler in our triagebot config (rust-lang/cargo#16361) - Don't read the config file twice when $CARGO_HOME is a symlink (rust-lang/cargo#16325) - fix(timings): forgot to negate filter (rust-lang/cargo#16352) - fix(doctest): Include all search paths with new build layout (rust-lang/cargo#16348) - fix(layout): Remove hashes from bins in new layout (rust-lang/cargo#16351) - docs(faq): Include an entry on disk space (rust-lang/cargo#16349) - feat(timings): derive concurrency data from unit data (rust-lang/cargo#16350) - perf(layout): Use unit_id, not pkg hash, for bin/lib pkg_dirs for new layout (rust-lang/cargo#16345) - Validate target source paths before compilation with clearer errors (rust-lang/cargo#16338) - test(doc): Remove unused build script (rust-lang/cargo#16344) - refactor(timings): store UnitData in RenderContext instead (rust-lang/cargo#16346) - perf(clean): Optimize (legacy) clean with multiple -p specifiers (rust-lang/cargo#16264) - test: Adjust output for out-of-tree build-dir (rust-lang/cargo#16343)
Update cargo submodule 29 commits in 2c283a9a5c5968eeb9a8f12313f04feb1ff8dfac..e91b2baa632c0c7e84216c91ecfe107c37d887c1 2025-12-04 16:47:28 +0000 to 2025-12-13 16:29:21 +0000 - refactor(lints): move from cargo::util::lints to cargo::lints (rust-lang/cargo#16392) - test(lint): redact more due to line got omitted (rust-lang/cargo#16391) - feat(report): cargo report timings HTML replay (rust-lang/cargo#16377) - feat: stabilize `-Zconfig-include` (rust-lang/cargo#16284) - fix(package): Don't verify registry for --list (rust-lang/cargo#16341) - Fixed incorrect locking logic when artifact-dir == build-dir (rust-lang/cargo#16385) - feat(log): make timing messages ready for HTML replay (rust-lang/cargo#16382) - chore(deps): update msrv (1 version) to v1.92 (rust-lang/cargo#16381) - Downgrade curl-sys to 0.4.83 (rust-lang/cargo#16379) - fix(timing): more self-contained timing/log data (rust-lang/cargo#16378) - test: update to `proc_macro::tracked::path` (rust-lang/cargo#16380) - refactor(lint): move lints to separate modules (rust-lang/cargo#16364) - fix(index): Apply feedback from Cargo team (rust-lang/cargo#16369) - fix(lints): handle lints separately at ws pkg level (rust-lang/cargo#16367) - feat(lint): new `implicit_minimum_version_req` lint (rust-lang/cargo#16321) - fix(info): default to local without explicit reg (rust-lang/cargo#16358) - Remove `[no-mentions]` handler in our triagebot config (rust-lang/cargo#16361) - Don't read the config file twice when $CARGO_HOME is a symlink (rust-lang/cargo#16325) - fix(timings): forgot to negate filter (rust-lang/cargo#16352) - fix(doctest): Include all search paths with new build layout (rust-lang/cargo#16348) - fix(layout): Remove hashes from bins in new layout (rust-lang/cargo#16351) - docs(faq): Include an entry on disk space (rust-lang/cargo#16349) - feat(timings): derive concurrency data from unit data (rust-lang/cargo#16350) - perf(layout): Use unit_id, not pkg hash, for bin/lib pkg_dirs for new layout (rust-lang/cargo#16345) - Validate target source paths before compilation with clearer errors (rust-lang/cargo#16338) - test(doc): Remove unused build script (rust-lang/cargo#16344) - refactor(timings): store UnitData in RenderContext instead (rust-lang/cargo#16346) - perf(clean): Optimize (legacy) clean with multiple -p specifiers (rust-lang/cargo#16264) - test: Adjust output for out-of-tree build-dir (rust-lang/cargo#16343)
Update cargo submodule 29 commits in 2c283a9a5c5968eeb9a8f12313f04feb1ff8dfac..e91b2baa632c0c7e84216c91ecfe107c37d887c1 2025-12-04 16:47:28 +0000 to 2025-12-13 16:29:21 +0000 - refactor(lints): move from cargo::util::lints to cargo::lints (rust-lang/cargo#16392) - test(lint): redact more due to line got omitted (rust-lang/cargo#16391) - feat(report): cargo report timings HTML replay (rust-lang/cargo#16377) - feat: stabilize `-Zconfig-include` (rust-lang/cargo#16284) - fix(package): Don't verify registry for --list (rust-lang/cargo#16341) - Fixed incorrect locking logic when artifact-dir == build-dir (rust-lang/cargo#16385) - feat(log): make timing messages ready for HTML replay (rust-lang/cargo#16382) - chore(deps): update msrv (1 version) to v1.92 (rust-lang/cargo#16381) - Downgrade curl-sys to 0.4.83 (rust-lang/cargo#16379) - fix(timing): more self-contained timing/log data (rust-lang/cargo#16378) - test: update to `proc_macro::tracked::path` (rust-lang/cargo#16380) - refactor(lint): move lints to separate modules (rust-lang/cargo#16364) - fix(index): Apply feedback from Cargo team (rust-lang/cargo#16369) - fix(lints): handle lints separately at ws pkg level (rust-lang/cargo#16367) - feat(lint): new `implicit_minimum_version_req` lint (rust-lang/cargo#16321) - fix(info): default to local without explicit reg (rust-lang/cargo#16358) - Remove `[no-mentions]` handler in our triagebot config (rust-lang/cargo#16361) - Don't read the config file twice when $CARGO_HOME is a symlink (rust-lang/cargo#16325) - fix(timings): forgot to negate filter (rust-lang/cargo#16352) - fix(doctest): Include all search paths with new build layout (rust-lang/cargo#16348) - fix(layout): Remove hashes from bins in new layout (rust-lang/cargo#16351) - docs(faq): Include an entry on disk space (rust-lang/cargo#16349) - feat(timings): derive concurrency data from unit data (rust-lang/cargo#16350) - perf(layout): Use unit_id, not pkg hash, for bin/lib pkg_dirs for new layout (rust-lang/cargo#16345) - Validate target source paths before compilation with clearer errors (rust-lang/cargo#16338) - test(doc): Remove unused build script (rust-lang/cargo#16344) - refactor(timings): store UnitData in RenderContext instead (rust-lang/cargo#16346) - perf(clean): Optimize (legacy) clean with multiple -p specifiers (rust-lang/cargo#16264) - test: Adjust output for out-of-tree build-dir (rust-lang/cargo#16343)
Update cargo submodule 29 commits in 2c283a9a5c5968eeb9a8f12313f04feb1ff8dfac..e91b2baa632c0c7e84216c91ecfe107c37d887c1 2025-12-04 16:47:28 +0000 to 2025-12-13 16:29:21 +0000 - refactor(lints): move from cargo::util::lints to cargo::lints (rust-lang/cargo#16392) - test(lint): redact more due to line got omitted (rust-lang/cargo#16391) - feat(report): cargo report timings HTML replay (rust-lang/cargo#16377) - feat: stabilize `-Zconfig-include` (rust-lang/cargo#16284) - fix(package): Don't verify registry for --list (rust-lang/cargo#16341) - Fixed incorrect locking logic when artifact-dir == build-dir (rust-lang/cargo#16385) - feat(log): make timing messages ready for HTML replay (rust-lang/cargo#16382) - chore(deps): update msrv (1 version) to v1.92 (rust-lang/cargo#16381) - Downgrade curl-sys to 0.4.83 (rust-lang/cargo#16379) - fix(timing): more self-contained timing/log data (rust-lang/cargo#16378) - test: update to `proc_macro::tracked::path` (rust-lang/cargo#16380) - refactor(lint): move lints to separate modules (rust-lang/cargo#16364) - fix(index): Apply feedback from Cargo team (rust-lang/cargo#16369) - fix(lints): handle lints separately at ws pkg level (rust-lang/cargo#16367) - feat(lint): new `implicit_minimum_version_req` lint (rust-lang/cargo#16321) - fix(info): default to local without explicit reg (rust-lang/cargo#16358) - Remove `[no-mentions]` handler in our triagebot config (rust-lang/cargo#16361) - Don't read the config file twice when $CARGO_HOME is a symlink (rust-lang/cargo#16325) - fix(timings): forgot to negate filter (rust-lang/cargo#16352) - fix(doctest): Include all search paths with new build layout (rust-lang/cargo#16348) - fix(layout): Remove hashes from bins in new layout (rust-lang/cargo#16351) - docs(faq): Include an entry on disk space (rust-lang/cargo#16349) - feat(timings): derive concurrency data from unit data (rust-lang/cargo#16350) - perf(layout): Use unit_id, not pkg hash, for bin/lib pkg_dirs for new layout (rust-lang/cargo#16345) - Validate target source paths before compilation with clearer errors (rust-lang/cargo#16338) - test(doc): Remove unused build script (rust-lang/cargo#16344) - refactor(timings): store UnitData in RenderContext instead (rust-lang/cargo#16346) - perf(clean): Optimize (legacy) clean with multiple -p specifiers (rust-lang/cargo#16264) - test: Adjust output for out-of-tree build-dir (rust-lang/cargo#16343)
Update cargo submodule 29 commits in 2c283a9a5c5968eeb9a8f12313f04feb1ff8dfac..e91b2baa632c0c7e84216c91ecfe107c37d887c1 2025-12-04 16:47:28 +0000 to 2025-12-13 16:29:21 +0000 - refactor(lints): move from cargo::util::lints to cargo::lints (rust-lang/cargo#16392) - test(lint): redact more due to line got omitted (rust-lang/cargo#16391) - feat(report): cargo report timings HTML replay (rust-lang/cargo#16377) - feat: stabilize `-Zconfig-include` (rust-lang/cargo#16284) - fix(package): Don't verify registry for --list (rust-lang/cargo#16341) - Fixed incorrect locking logic when artifact-dir == build-dir (rust-lang/cargo#16385) - feat(log): make timing messages ready for HTML replay (rust-lang/cargo#16382) - chore(deps): update msrv (1 version) to v1.92 (rust-lang/cargo#16381) - Downgrade curl-sys to 0.4.83 (rust-lang/cargo#16379) - fix(timing): more self-contained timing/log data (rust-lang/cargo#16378) - test: update to `proc_macro::tracked::path` (rust-lang/cargo#16380) - refactor(lint): move lints to separate modules (rust-lang/cargo#16364) - fix(index): Apply feedback from Cargo team (rust-lang/cargo#16369) - fix(lints): handle lints separately at ws pkg level (rust-lang/cargo#16367) - feat(lint): new `implicit_minimum_version_req` lint (rust-lang/cargo#16321) - fix(info): default to local without explicit reg (rust-lang/cargo#16358) - Remove `[no-mentions]` handler in our triagebot config (rust-lang/cargo#16361) - Don't read the config file twice when $CARGO_HOME is a symlink (rust-lang/cargo#16325) - fix(timings): forgot to negate filter (rust-lang/cargo#16352) - fix(doctest): Include all search paths with new build layout (rust-lang/cargo#16348) - fix(layout): Remove hashes from bins in new layout (rust-lang/cargo#16351) - docs(faq): Include an entry on disk space (rust-lang/cargo#16349) - feat(timings): derive concurrency data from unit data (rust-lang/cargo#16350) - perf(layout): Use unit_id, not pkg hash, for bin/lib pkg_dirs for new layout (rust-lang/cargo#16345) - Validate target source paths before compilation with clearer errors (rust-lang/cargo#16338) - test(doc): Remove unused build script (rust-lang/cargo#16344) - refactor(timings): store UnitData in RenderContext instead (rust-lang/cargo#16346) - perf(clean): Optimize (legacy) clean with multiple -p specifiers (rust-lang/cargo#16264) - test: Adjust output for out-of-tree build-dir (rust-lang/cargo#16343)
Update cargo submodule 29 commits in 2c283a9a5c5968eeb9a8f12313f04feb1ff8dfac..e91b2baa632c0c7e84216c91ecfe107c37d887c1 2025-12-04 16:47:28 +0000 to 2025-12-13 16:29:21 +0000 - refactor(lints): move from cargo::util::lints to cargo::lints (rust-lang/cargo#16392) - test(lint): redact more due to line got omitted (rust-lang/cargo#16391) - feat(report): cargo report timings HTML replay (rust-lang/cargo#16377) - feat: stabilize `-Zconfig-include` (rust-lang/cargo#16284) - fix(package): Don't verify registry for --list (rust-lang/cargo#16341) - Fixed incorrect locking logic when artifact-dir == build-dir (rust-lang/cargo#16385) - feat(log): make timing messages ready for HTML replay (rust-lang/cargo#16382) - chore(deps): update msrv (1 version) to v1.92 (rust-lang/cargo#16381) - Downgrade curl-sys to 0.4.83 (rust-lang/cargo#16379) - fix(timing): more self-contained timing/log data (rust-lang/cargo#16378) - test: update to `proc_macro::tracked::path` (rust-lang/cargo#16380) - refactor(lint): move lints to separate modules (rust-lang/cargo#16364) - fix(index): Apply feedback from Cargo team (rust-lang/cargo#16369) - fix(lints): handle lints separately at ws pkg level (rust-lang/cargo#16367) - feat(lint): new `implicit_minimum_version_req` lint (rust-lang/cargo#16321) - fix(info): default to local without explicit reg (rust-lang/cargo#16358) - Remove `[no-mentions]` handler in our triagebot config (rust-lang/cargo#16361) - Don't read the config file twice when $CARGO_HOME is a symlink (rust-lang/cargo#16325) - fix(timings): forgot to negate filter (rust-lang/cargo#16352) - fix(doctest): Include all search paths with new build layout (rust-lang/cargo#16348) - fix(layout): Remove hashes from bins in new layout (rust-lang/cargo#16351) - docs(faq): Include an entry on disk space (rust-lang/cargo#16349) - feat(timings): derive concurrency data from unit data (rust-lang/cargo#16350) - perf(layout): Use unit_id, not pkg hash, for bin/lib pkg_dirs for new layout (rust-lang/cargo#16345) - Validate target source paths before compilation with clearer errors (rust-lang/cargo#16338) - test(doc): Remove unused build script (rust-lang/cargo#16344) - refactor(timings): store UnitData in RenderContext instead (rust-lang/cargo#16346) - perf(clean): Optimize (legacy) clean with multiple -p specifiers (rust-lang/cargo#16264) - test: Adjust output for out-of-tree build-dir (rust-lang/cargo#16343)
Co-authored-by: dino [email protected]
What does this PR try to resolve?
This commit optimizes implementation of
cargo clean -pby reducing the amount of directory walks that take place.We now batch calls to
rm_rf_prefix_list, thus potentially avoiding multiple walks over a single subdirectory. In practice this helps us significantly reduce the runtime for clearing large workspaces (as implemented in #16263); for Zed,cargo clean --workspacewent down from 73 seconds to 3 seconds.We have 216 workspace members.
How to test and review this PR?
We've tested it by hand, running it against
regex,ruffandzedcodebases.This PR is still marked as draft, as I don't love the code. I would also understand if y'all were against merging this, given that new build directory layout is in flight.