Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix glob performance on large monorepos #1244

Merged
merged 4 commits into from
Jul 5, 2024
Merged

Fix glob performance on large monorepos #1244

merged 4 commits into from
Jul 5, 2024

Conversation

f-f
Copy link
Member

@f-f f-f commented Jul 5, 2024

Fix #1242, using purescript-core as the benchmark.

The final patch goes back to the approach we had before #1234, where we'd compose the glob-matching functions instead of keeping a list of ignores, and recompute the matchers when needed.

The patch in #1234 was not optimised, as the matchers were being recreated for every file encountered, and optimising that in the first two commits of this PR got us down to 2x of the performance pre-1234.
That is unfortunately still not acceptable, so I reintroduced the function-composition approach, which is still prone to blowing the stack - the change here should reduce that risk, since instead of composing every line of gitignores as a separate matcher in the chain, we instead nest them in a single or block. That should dramatically reduce the size of the call chain.

cc @Blugatroff @cakekindel

@f-f
Copy link
Member Author

f-f commented Jul 5, 2024

The current changes bring the runtime down from 3min 41s to 53s.

For reference, 0.93.32 takes less than 6s, so we still have way to go.

@f-f
Copy link
Member Author

f-f commented Jul 5, 2024

Down to 12s now..

@f-f
Copy link
Member Author

f-f commented Jul 5, 2024

And back to 6s.

@cakekindel
Copy link
Contributor

I wonder if it would be possible to stack-safely and performantly accumulate into a regex pattern, digging a bit into micromatch's source and the core implementation picomatch I realize we could avoid reparsing the patterns (which is where I assume the perf cost is)

@f-f
Copy link
Member Author

f-f commented Jul 5, 2024

@cakekindel that would be cool!
I'll now merge&release this so that whoever is affected by the issue is unblocked, let's add improvements in a followup PR

@@ -74,41 +74,63 @@ fsWalk cwd ignorePatterns includePatterns = Aff.makeAff \cb -> do

-- Pattern for directories which can be outright ignored.
-- This will be updated whenver a .gitignore is found.
ignoreMatcherRef :: Ref Glob <- Ref.new { ignore: [], include: ignorePatterns }
ignoreMatcherRef :: Ref (String -> Boolean) <- Ref.new (testGlob { ignore: [], include: ignorePatterns })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this never occurred to me before but if we always boolean OR || the matcher functions could we not accumulate into an Array (String -> Boolean) and flatten at the end with ||, avoiding the stack unsafety?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this what we are doing now with the double or (line 103 and line 115)?

Copy link
Contributor

@cakekindel cakekindel Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, because or (_ :: Array (_ -> Boolean)) still uses HeytingAlgebra (a -> Boolean), which doesn't get TCOd consistently / at all

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying this out on purescm/purescript-core and seeing a negligible difference. I'll keep it around in case someone opens a stack overflow issue again (or I can open the PR anyway)

https://github.com/purescript/spago/compare/master...cakekindel:fix-glob-perf?expand=1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see - if the performance is comparable and we can make it stack safe then that's a welcome addition

@f-f f-f merged commit f130b33 into master Jul 5, 2024
3 checks passed
@f-f f-f deleted the fix-glob-perf branch July 5, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stuck on "Gathering all the spago configs in the tree"
2 participants