Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(transformerMetaWordHighlight): use regex instead of substring to find matching indices #909

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

artt
Copy link

@artt artt commented Jan 27, 2025

Description

Partially fixes #908 and add tests.

Additional context

Users still cannot freely choose which instance the highlight should be on (regex will choose whatever comes up first).

It would be nice to allow users to manually specify which character (by index) the highlight should be. For example, `js {1[6:12],3-4} to highlight characters 6 through 12 of the first line, and lines 3+4. But that seems like a lot more work. At least this PR will eliminate the error.

Copy link

netlify bot commented Jan 27, 2025

Deploy Preview for shiki-next ready!

Name Link
🔨 Latest commit 0b9137f
🔍 Latest deploy log https://app.netlify.com/sites/shiki-next/deploys/6797c33bbebcf3000828ff85
😎 Deploy Preview https://deploy-preview-909--shiki-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Jan 27, 2025

Deploy Preview for shiki-matsu ready!

Name Link
🔨 Latest commit 0b9137f
🔍 Latest deploy log https://app.netlify.com/sites/shiki-matsu/deploys/6797c33b458f970008314b99
😎 Deploy Preview https://deploy-preview-909--shiki-matsu.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

// eslint-disable-next-line no-cond-assign
while ((i = str.indexOf(substr, i + 1)) !== -1)
indexes.push(i)
const re = new RegExp(substr, 'g')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to properly escape the substr if we want to use that approach.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is already handled by this function parseMetaHighlightWords. If you could help give some examples that would be awesome.

https://github.com/shikijs/shiki/blob/0b9137f8413eb1f276928707c0ac92de0664a314/packages/transformers/src/transformers/meta-highlight-word.ts#L3C1-L13C2

And, do we want to use this approach?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safest way to escape regex special characters is to use Regex+ and interpolate the string to escape. Nearly as safe but with noisier output would be to use native RegExp.escape (an ES proposal), which isn't supported by browsers yet. The most popular way to escape regex special characters (based on npm download stats) is to use the unsafe but lightweight escape-string-regexp. Context safety isn't relevant when the entire regex pattern is an escaped string, though, so no worries.

If it was me, I might just add an inline .replace(/[|\\{}()[\]^$+*?.]/g, '\\$&') with no libraries. This doesn't escape chars that might need to be escaped based on context (-, ,, digits, etc.), since no context awareness is needed when the escaped string is being used as the entire regex pattern.

But then, I don't understand why you're moving from string search to regex search in the first place.

parseMetaHighlightWords isn't escaping special characters; it seems to be matching JS regex literals? I also notice it has a couple minor issue issues with that: it will inappropriately match /foo\/ (no unescaped trailing /) and it will not correctly match /[/]/ (unescaped / is allowed in JS character classes, unless the regex uses flag v).

// eslint-disable-next-line no-cond-assign
while ((i = str.indexOf(substr, i + 1)) !== -1)
indexes.push(i)
const re = new RegExp(substr, 'g')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safest way to escape regex special characters is to use Regex+ and interpolate the string to escape. Nearly as safe but with noisier output would be to use native RegExp.escape (an ES proposal), which isn't supported by browsers yet. The most popular way to escape regex special characters (based on npm download stats) is to use the unsafe but lightweight escape-string-regexp. Context safety isn't relevant when the entire regex pattern is an escaped string, though, so no worries.

If it was me, I might just add an inline .replace(/[|\\{}()[\]^$+*?.]/g, '\\$&') with no libraries. This doesn't escape chars that might need to be escaped based on context (-, ,, digits, etc.), since no context awareness is needed when the escaped string is being used as the entire regex pattern.

But then, I don't understand why you're moving from string search to regex search in the first place.

parseMetaHighlightWords isn't escaping special characters; it seems to be matching JS regex literals? I also notice it has a couple minor issue issues with that: it will inappropriately match /foo\/ (no unescaped trailing /) and it will not correctly match /[/]/ (unescaped / is allowed in JS character classes, unless the regex uses flag v).

indexes.push(i)
const re = new RegExp(substr, 'g')
let match = re.exec(str)
while (match !== null) {
Copy link
Collaborator

@slevithan slevithan Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will loop forever if re matches an empty string. Is that possible here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

transformerMetaWordHighlight fail to highlight overlapping text instances
3 participants