fix(transformerMetaWordHighlight): use regex instead of substring to find matching indices #909

artt · 2025-01-27T17:32:39Z

Description

Partially fixes #908 and add tests.

Additional context

Users still cannot freely choose which instance the highlight should be on (regex will choose whatever comes up first).

It would be nice to allow users to manually specify which character (by index) the highlight should be. For example, `js {1[6:12],3-4} to highlight characters 6 through 12 of the first line, and lines 3+4. But that seems like a lot more work. At least this PR will eliminate the error.

…find matching indices; add tests

netlify · 2025-01-27T17:32:56Z

✅ Deploy Preview for shiki-next ready!

Name	Link
🔨 Latest commit	`0b9137f`
🔍 Latest deploy log	https://app.netlify.com/sites/shiki-next/deploys/6797c33bbebcf3000828ff85
😎 Deploy Preview	https://deploy-preview-909--shiki-next.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify · 2025-01-27T17:32:57Z

✅ Deploy Preview for shiki-matsu ready!

Name	Link
🔨 Latest commit	`0b9137f`
🔍 Latest deploy log	https://app.netlify.com/sites/shiki-matsu/deploys/6797c33b458f970008314b99
😎 Deploy Preview	https://deploy-preview-909--shiki-matsu.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

antfu · 2025-01-30T09:28:06Z

packages/transformers/src/transformers/meta-highlight-word.ts

-  // eslint-disable-next-line no-cond-assign
-  while ((i = str.indexOf(substr, i + 1)) !== -1)
-    indexes.push(i)
+  const re = new RegExp(substr, 'g')


I guess we need to properly escape the substr if we want to use that approach.

Not sure if this is already handled by this function parseMetaHighlightWords. If you could help give some examples that would be awesome.

https://github.com/shikijs/shiki/blob/0b9137f8413eb1f276928707c0ac92de0664a314/packages/transformers/src/transformers/meta-highlight-word.ts#L3C1-L13C2

And, do we want to use this approach?

The safest way to escape regex special characters is to use Regex+ and interpolate the string to escape. Nearly as safe but with noisier output would be to use native RegExp.escape (an ES proposal), which isn't supported by browsers yet. The most popular way to escape regex special characters (based on npm download stats) is to use the unsafe but lightweight escape-string-regexp. Context safety isn't relevant when the entire regex pattern is an escaped string, though, so no worries.

If it was me, I might just add an inline .replace(/[|\\{}()[\]^$+*?.]/g, '\\$&') with no libraries. This doesn't escape chars that might need to be escaped based on context (-, ,, digits, etc.), since no context awareness is needed when the escaped string is being used as the entire regex pattern.

But then, I don't understand why you're moving from string search to regex search in the first place.

parseMetaHighlightWords isn't escaping special characters; it seems to be matching JS regex literals? I also notice it has a couple minor issue issues with that: it will inappropriately match /foo\/ (no unescaped trailing /) and it will not correctly match /[/]/ (unescaped / is allowed in JS character classes, unless the regex uses flag v).

slevithan · 2025-02-05T19:17:00Z

packages/transformers/src/transformers/meta-highlight-word.ts

-  // eslint-disable-next-line no-cond-assign
-  while ((i = str.indexOf(substr, i + 1)) !== -1)
-    indexes.push(i)
+  const re = new RegExp(substr, 'g')


The safest way to escape regex special characters is to use Regex+ and interpolate the string to escape. Nearly as safe but with noisier output would be to use native RegExp.escape (an ES proposal), which isn't supported by browsers yet. The most popular way to escape regex special characters (based on npm download stats) is to use the unsafe but lightweight escape-string-regexp. Context safety isn't relevant when the entire regex pattern is an escaped string, though, so no worries.

If it was me, I might just add an inline .replace(/[|\\{}()[\]^$+*?.]/g, '\\$&') with no libraries. This doesn't escape chars that might need to be escaped based on context (-, ,, digits, etc.), since no context awareness is needed when the escaped string is being used as the entire regex pattern.

But then, I don't understand why you're moving from string search to regex search in the first place.

parseMetaHighlightWords isn't escaping special characters; it seems to be matching JS regex literals? I also notice it has a couple minor issue issues with that: it will inappropriately match /foo\/ (no unescaped trailing /) and it will not correctly match /[/]/ (unescaped / is allowed in JS character classes, unless the regex uses flag v).

slevithan · 2025-02-05T19:18:11Z

packages/transformers/src/transformers/meta-highlight-word.ts

-    indexes.push(i)
+  const re = new RegExp(substr, 'g')
+  let match = re.exec(str)
+  while (match !== null) {


This will loop forever if re matches an empty string. Is that possible here?

fix(transformerMetaWordHighlight): use regex instead of substring to …

0b9137f

…find matching indices; add tests

antfu reviewed Jan 30, 2025

View reviewed changes

slevithan reviewed Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(transformerMetaWordHighlight): use regex instead of substring to find matching indices #909

fix(transformerMetaWordHighlight): use regex instead of substring to find matching indices #909

artt commented Jan 27, 2025

netlify bot commented Jan 27, 2025 •

edited

Loading

netlify bot commented Jan 27, 2025 •

edited

Loading

antfu Jan 30, 2025

artt Jan 31, 2025

slevithan Feb 5, 2025

slevithan Feb 5, 2025

slevithan Feb 5, 2025 •

edited

Loading

fix(transformerMetaWordHighlight): use regex instead of substring to find matching indices #909

Are you sure you want to change the base?

fix(transformerMetaWordHighlight): use regex instead of substring to find matching indices #909

Conversation

artt commented Jan 27, 2025

Description

Additional context

netlify bot commented Jan 27, 2025 • edited Loading

✅ Deploy Preview for shiki-next ready!

netlify bot commented Jan 27, 2025 • edited Loading

✅ Deploy Preview for shiki-matsu ready!

antfu Jan 30, 2025

Choose a reason for hiding this comment

artt Jan 31, 2025

Choose a reason for hiding this comment

slevithan Feb 5, 2025

Choose a reason for hiding this comment

slevithan Feb 5, 2025

Choose a reason for hiding this comment

slevithan Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

netlify bot commented Jan 27, 2025 •

edited

Loading

netlify bot commented Jan 27, 2025 •

edited

Loading

slevithan Feb 5, 2025 •

edited

Loading