Fix glob performance on large monorepos #1244

f-f · 2024-07-05T12:20:35Z

Fix #1242, using purescript-core as the benchmark.

The final patch goes back to the approach we had before #1234, where we'd compose the glob-matching functions instead of keeping a list of ignores, and recompute the matchers when needed.

The patch in #1234 was not optimised, as the matchers were being recreated for every file encountered, and optimising that in the first two commits of this PR got us down to 2x of the performance pre-1234.
That is unfortunately still not acceptable, so I reintroduced the function-composition approach, which is still prone to blowing the stack - the change here should reduce that risk, since instead of composing every line of gitignores as a separate matcher in the chain, we instead nest them in a single or block. That should dramatically reduce the size of the call chain.

cc @Blugatroff @cakekindel

f-f · 2024-07-05T12:20:56Z

The current changes bring the runtime down from 3min 41s to 53s.

For reference, 0.93.32 takes less than 6s, so we still have way to go.

f-f · 2024-07-05T13:12:54Z

Down to 12s now..

f-f · 2024-07-05T13:35:39Z

And back to 6s.

cakekindel · 2024-07-05T14:13:19Z

I wonder if it would be possible to stack-safely and performantly accumulate into a regex pattern, digging a bit into micromatch's source and the core implementation picomatch I realize we could avoid reparsing the patterns (which is where I assume the perf cost is)

f-f · 2024-07-05T14:19:26Z

@cakekindel that would be cool!
I'll now merge&release this so that whoever is affected by the issue is unblocked, let's add improvements in a followup PR

cakekindel · 2024-07-05T14:19:36Z

src/Spago/Glob.purs

@@ -74,41 +74,63 @@ fsWalk cwd ignorePatterns includePatterns = Aff.makeAff \cb -> do

  -- Pattern for directories which can be outright ignored.
  -- This will be updated whenver a .gitignore is found.
-  ignoreMatcherRef :: Ref Glob <- Ref.new { ignore: [], include: ignorePatterns }
+  ignoreMatcherRef :: Ref (String -> Boolean) <- Ref.new (testGlob { ignore: [], include: ignorePatterns })


Not sure why this never occurred to me before but if we always boolean OR || the matcher functions could we not accumulate into an Array (String -> Boolean) and flatten at the end with ||, avoiding the stack unsafety?

Isn't this what we are doing now with the double or (line 103 and line 115)?

Not necessarily, because or (_ :: Array (_ -> Boolean)) still uses HeytingAlgebra (a -> Boolean), which doesn't get TCOd consistently / at all

trying this out on purescm/purescript-core and seeing a negligible difference. I'll keep it around in case someone opens a stack overflow issue again (or I can open the PR anyway)

https://github.com/purescript/spago/compare/master...cakekindel:fix-glob-perf?expand=1

Ah I see - if the performance is comparable and we can make it stack safe then that's a welcome addition

Recompute the ignore glob only when needed instead of at every path

b5dc3ec

Same buf for the deepFilter

25b2ede

Back to composing functions

778d2f4

Fix stack-safety test

a6c54a1

cakekindel reviewed Jul 5, 2024

View reviewed changes

f-f merged commit f130b33 into master Jul 5, 2024
3 checks passed

f-f deleted the fix-glob-perf branch July 5, 2024 14:21

Blugatroff mentioned this pull request Oct 22, 2024

Significant slow down finding spago.yaml config when building #1295

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix glob performance on large monorepos #1244

Fix glob performance on large monorepos #1244

f-f commented Jul 5, 2024 •

edited

Loading

f-f commented Jul 5, 2024 •

edited

Loading

f-f commented Jul 5, 2024

f-f commented Jul 5, 2024

cakekindel commented Jul 5, 2024

f-f commented Jul 5, 2024

cakekindel Jul 5, 2024

f-f Jul 5, 2024

cakekindel Jul 5, 2024 •

edited

Loading

cakekindel Jul 5, 2024

f-f Jul 6, 2024

Fix glob performance on large monorepos #1244

Fix glob performance on large monorepos #1244

Conversation

f-f commented Jul 5, 2024 • edited Loading

f-f commented Jul 5, 2024 • edited Loading

f-f commented Jul 5, 2024

f-f commented Jul 5, 2024

cakekindel commented Jul 5, 2024

f-f commented Jul 5, 2024

cakekindel Jul 5, 2024

Choose a reason for hiding this comment

f-f Jul 5, 2024

Choose a reason for hiding this comment

cakekindel Jul 5, 2024 • edited Loading

Choose a reason for hiding this comment

cakekindel Jul 5, 2024

Choose a reason for hiding this comment

f-f Jul 6, 2024

Choose a reason for hiding this comment

f-f commented Jul 5, 2024 •

edited

Loading

f-f commented Jul 5, 2024 •

edited

Loading

cakekindel Jul 5, 2024 •

edited

Loading