-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix glob performance on large monorepos #1244
Conversation
The current changes bring the runtime down from For reference, 0.93.32 takes less than |
Down to 12s now.. |
And back to 6s. |
I wonder if it would be possible to stack-safely and performantly accumulate into a regex pattern, digging a bit into micromatch's source and the core implementation picomatch I realize we could avoid reparsing the patterns (which is where I assume the perf cost is) |
@cakekindel that would be cool! |
@@ -74,41 +74,63 @@ fsWalk cwd ignorePatterns includePatterns = Aff.makeAff \cb -> do | |||
|
|||
-- Pattern for directories which can be outright ignored. | |||
-- This will be updated whenver a .gitignore is found. | |||
ignoreMatcherRef :: Ref Glob <- Ref.new { ignore: [], include: ignorePatterns } | |||
ignoreMatcherRef :: Ref (String -> Boolean) <- Ref.new (testGlob { ignore: [], include: ignorePatterns }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this never occurred to me before but if we always boolean OR ||
the matcher functions could we not accumulate into an Array (String -> Boolean)
and flatten at the end with ||
, avoiding the stack unsafety?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this what we are doing now with the double or
(line 103 and line 115)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, because or (_ :: Array (_ -> Boolean))
still uses HeytingAlgebra (a -> Boolean)
, which doesn't get TCOd consistently / at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying this out on purescm/purescript-core
and seeing a negligible difference. I'll keep it around in case someone opens a stack overflow issue again (or I can open the PR anyway)
https://github.com/purescript/spago/compare/master...cakekindel:fix-glob-perf?expand=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see - if the performance is comparable and we can make it stack safe then that's a welcome addition
Fix #1242, using purescript-core as the benchmark.
The final patch goes back to the approach we had before #1234, where we'd compose the glob-matching functions instead of keeping a list of ignores, and recompute the matchers when needed.
The patch in #1234 was not optimised, as the matchers were being recreated for every file encountered, and optimising that in the first two commits of this PR got us down to 2x of the performance pre-1234.
That is unfortunately still not acceptable, so I reintroduced the function-composition approach, which is still prone to blowing the stack - the change here should reduce that risk, since instead of composing every line of gitignores as a separate matcher in the chain, we instead nest them in a single
or
block. That should dramatically reduce the size of the call chain.cc @Blugatroff @cakekindel