Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path to supporting 100% of grammars in the JS engine #876

Open
slevithan opened this issue Dec 27, 2024 · 4 comments
Open

Path to supporting 100% of grammars in the JS engine #876

slevithan opened this issue Dec 27, 2024 · 4 comments

Comments

@slevithan
Copy link
Collaborator

slevithan commented Dec 27, 2024

Shiki's JS engine uses Oniguruma-To-ES under the hood to emulate Oniguruma regexes. Grammar support has continually increased with recent updates and now the vast majority of grammars (which contain dozens to thousands of regexes each) are supported.

Following is what's needed to support the remaining grammars.

Mismatched

These are grammars that produce different results in Shiki's JS and WASM engines for the provided grammar samples.

  • Kusto
    • Includes a regex that triggers an Oniguruma bug, so in fact the JS engine (not the WASM engine) is producing the correct highlighting. The Oniguruma bug needs to be worked around in the Kusto grammar. [upstream issue] [PR]
    • Fix landed in upstream grammar. Published in Shiki 1.25.0.
  • NGINX
    • Includes an unsupported use of \G that is ignored based on options Shiki uses, resulting in the mismatch.
    • Support for all \G anchors landed in Oniguruma-To-ES 2.0.0. Published in Shiki 1.27.1.

Unsupported

These are grammars that throw an error for at least one regex in Shiki's JS engine, which by default does not silence errors like the WASM engine.

  • Ada
    • Includes an invalid Oniguruma regex. [upstream PR]
    • Fix landed in upstream grammar. Published in Shiki 1.25.0.
  • Hack
    • Includes a regex that triggers an Oniguruma bug (that Oniguruma-To-ES chooses to throw for). [upstream PR]
    • Fix landed in upstream grammar. Published in Shiki 1.26.2.
  • CodeQL
    • Includes an invalid Oniguruma regex. [upstream PR]
    • Fix landed in upstream grammar. Will make its way to Shiki soon.
  • Sass
    • Includes two invalid Oniguruma regexes. [upstream PR]
    • Fix landed in upstream grammar. Published in Shiki 2.1.0.
  • Swift
    • Uses conditionals (?(…)…) (the only grammar that does so) in three regexes.
      • Conditionals aren't currently supported for emulation and aren't emulatable in all cases.
      • The regexes need to be refactored to remove the conditionals. [upstream issue]
  • C#
    • Uses absent repeaters (?~…) (the only grammar that does so) in two regexes.
      • Absent repeaters can be supported. [tracking issue]
      • Support for absent repeaters landed in Oniguruma-To-ES 2.3.0. Published in Shiki 2.1.0.
    • Uses multiple overlapping recursions (one of only two grammars that does so) in one regex.
      • It's not feasible to support this for performance and other reasons.
      • The regex needs to be refactored. [upstream issue] [PR]
      • Refactor landed in upstream grammar. Will make its way to Shiki soon.
  • Razor
    • Uses an invalid JS identifier as a group name (the only grammar that does so) in two regexes.
      • Support for invalid JS identifiers as group names landed in Oniguruma-To-ES 2.2.0. Published in Shiki 1.28.0.
    • Additionally shares C#'s errors because it embeds the C# grammar.
      • C# refactor landed. Will make its way to Shiki soon.
  • PureScript
    • Uses multiple overlapping recursions (one of only two grammars that does so) in one regex.
      • See related comments for C#. The regex needs to be refactored. [upstream issue]

Resolving all of the above would result in 100% JS engine support for Shiki's grammar samples. Quite a remarkable feat if you're familiar with the challenges and complexity of getting to this point.

Any help with getting these grammars supported in the JS engine would be very welcome.

@slevithan
Copy link
Collaborator Author

@antfu Would it be a good idea to add a requirement for new grammars added to https://github.com/shikijs/textmate-grammars-themes that they need to support the JS engine? As you can see above, this will rarely be a factor, and when it is, it might be because of a bug in the grammar or Oniguruma itself that the author didn't account for.

@antfu
Copy link
Member

antfu commented Dec 30, 2024

That's astonishing work you have done. Thanks a lot for all the effort!

added to shikijs/textmate-grammars-themes that they need to support the JS engine?

Oh yeah, that would be great! We could introduce verification scripts to that repo, so we could catch them up in the CI. We could whitelist the known incompatible ones, so it always covers the new grammar.

@slevithan
Copy link
Collaborator Author

slevithan commented Jan 8, 2025

oniguruma-to-es v1.0.0 added robust validation of lookbehind contents, causing it to identify and throw for one invalid Oniguruma regex in the CodeQL grammar (it uses lookahead within lookbehind, which is valid in JS but not Oniguruma). As a result, CodeQL has moved to Unsupported. I've added it to the list in the top comment.

Since it's erroring in both the JS and WASM engines (but silently in the WASM engine), using the forgiving option to silence errors in the JS engine leads to it working the same in both.

@slevithan
Copy link
Collaborator Author

slevithan commented Jan 13, 2025

Fixes have now landed for three grammars (Ada, Hack, Kusto). They're now supported by the JS engine, and I've edited the post above.

Shiki 1.26.2 bumped to a version of tm-grammars that removed some legacy preprocessing of grammars. The preprocessing, among other issues, was apparently escaping at least some quantifiers that weren't attached to quantifiable tokens. By removing the preprocessing, it revealed that the Sass grammar included two invalid Oniguruma regexes, and as a result Sass has moved to Unsupported for the JS engine. I've added it to the list in the top comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants