Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: gitignoringGlob stack safety + perf #1234

Merged
merged 7 commits into from
Jun 24, 2024
Merged

Conversation

cakekindel
Copy link
Contributor

Description of the change

fixes #1231

Before, we were eagerly turning string glob patterns to matching functions String -> Boolean, and adding new patterns with ||. The instance (HeytingAlgebra b) => HeytingAlgebra (a -> b) isn't stack safe, causing sufficiently large gitignore files to overflow the call stack.

Now, we accumulate the patterns themselves and build them into micromatch patterns when we evaluate them. The actual call to micromatch costs practically nothing, so this flat-out wins time & space cost (as well as fixing the stack-safety issue)

-- Instead of composing the matcher functions, we could also keep a growing array of
-- patterns and regenerate the matcher on every append. I don't know which option is
-- more performant, but composing functions is more convenient.

Turns out we've learned which option is more performant 😅

Checklist:

  • Added the change to the "Unreleased" section of the changelog
  • Added some example of the new feature to the README
  • Added a test for the contribution (if applicable)

P.S.: the above checks are not compulsory to get a change merged, so you may skip them. However, taking care of them will result in less work for the maintainers and will be much appreciated 😊

@@ -66,3 +73,9 @@ spec = Spec.around globTmpDir do
FS.writeTextFile (Path.concat [p, ".gitignore"]) """/fruits\n/src"""
a <- Glob.gitignoringGlob p ["fruits/apple/**"]
a `Assert.shouldEqual` ["fruits/apple"]

Spec.it "is stacksafe" \p -> do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fails on f0f7901 (tip of master at the moment)

import Effect.Aff as Aff
import Effect.Ref as Ref
import Node.FS.Sync as SyncFS
import Node.Path as Path
import Record as Record
import Type.Proxy (Proxy(..))

type MicroMatchOptions = { ignore :: Array String }
type MicroMatchOptions = { ignore :: Array String, include :: Array String }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change allows us to accumulate with appending arrays instead of ||ing functions.

Fun fact for those who may not be aware, if all of a record's fields are Semigroup, the record is Semigroup so you can <> these :)

@cakekindel
Copy link
Contributor Author

is micromatch not cross-platform? 💢

@f-f
Copy link
Member

f-f commented Jun 18, 2024

I am very confused as well - I went through the code and all the paths seem to be handled properly, so I am not sure why the failures

@cakekindel
Copy link
Contributor Author

I'd imagine it's because there's an assumption that foo/** will match both foo and foo/a, which i'm guessing doesn't hold on windows 😕

Copy link
Member

@f-f f-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @cakekindel! 👏

@f-f f-f merged commit 07987af into purescript:master Jun 24, 2024
3 checks passed
f-f pushed a commit that referenced this pull request Jun 24, 2024
f-f added a commit that referenced this pull request Jul 5, 2024
The final patch goes back to the approach we had before #1234, where we'd compose the glob-matching functions instead of keeping a list of ignores, and recompute the matchers when needed.

The patch in #1234 was not optimised, as the matchers were being recreated for every file encountered, and optimising that in the first two commits of this PR got us down to 2x of the performance pre-1234.
That is unfortunately still not acceptable, so I reintroduced the function-composition approach, which is still prone to blowing the stack - the change here should reduce that risk, since instead of composing every line of gitignores as a separate matcher in the chain, we instead nest them in a single or block. That should dramatically reduce the size of the call chain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gitignoringGlob exceeding call stack - add explicit package globs to workspace?
2 participants