-
-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(transformerMetaWordHighlight): use regex instead of substring to find matching indices #909
base: main
Are you sure you want to change the base?
Conversation
…find matching indices; add tests
✅ Deploy Preview for shiki-next ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
✅ Deploy Preview for shiki-matsu ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
// eslint-disable-next-line no-cond-assign | ||
while ((i = str.indexOf(substr, i + 1)) !== -1) | ||
indexes.push(i) | ||
const re = new RegExp(substr, 'g') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to properly escape the substr
if we want to use that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is already handled by this function parseMetaHighlightWords
. If you could help give some examples that would be awesome.
And, do we want to use this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safest way to escape regex special characters is to use Regex+ and interpolate the string to escape. Nearly as safe but with noisier output would be to use native RegExp.escape
(an ES proposal), which isn't supported by browsers yet. The most popular way to escape regex special characters (based on npm download stats) is to use the unsafe but lightweight escape-string-regexp. Context safety isn't relevant when the entire regex pattern is an escaped string, though, so no worries.
If it was me, I might just add an inline .replace(/[|\\{}()[\]^$+*?.]/g, '\\$&')
with no libraries. This doesn't escape chars that might need to be escaped based on context (-
, ,
, digits, etc.), since no context awareness is needed when the escaped string is being used as the entire regex pattern.
But then, I don't understand why you're moving from string search to regex search in the first place.
parseMetaHighlightWords
isn't escaping special characters; it seems to be matching JS regex literals? I also notice it has a couple minor issue issues with that: it will inappropriately match /foo\/
(no unescaped trailing /
) and it will not correctly match /[/]/
(unescaped /
is allowed in JS character classes, unless the regex uses flag v
).
// eslint-disable-next-line no-cond-assign | ||
while ((i = str.indexOf(substr, i + 1)) !== -1) | ||
indexes.push(i) | ||
const re = new RegExp(substr, 'g') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safest way to escape regex special characters is to use Regex+ and interpolate the string to escape. Nearly as safe but with noisier output would be to use native RegExp.escape
(an ES proposal), which isn't supported by browsers yet. The most popular way to escape regex special characters (based on npm download stats) is to use the unsafe but lightweight escape-string-regexp. Context safety isn't relevant when the entire regex pattern is an escaped string, though, so no worries.
If it was me, I might just add an inline .replace(/[|\\{}()[\]^$+*?.]/g, '\\$&')
with no libraries. This doesn't escape chars that might need to be escaped based on context (-
, ,
, digits, etc.), since no context awareness is needed when the escaped string is being used as the entire regex pattern.
But then, I don't understand why you're moving from string search to regex search in the first place.
parseMetaHighlightWords
isn't escaping special characters; it seems to be matching JS regex literals? I also notice it has a couple minor issue issues with that: it will inappropriately match /foo\/
(no unescaped trailing /
) and it will not correctly match /[/]/
(unescaped /
is allowed in JS character classes, unless the regex uses flag v
).
indexes.push(i) | ||
const re = new RegExp(substr, 'g') | ||
let match = re.exec(str) | ||
while (match !== null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will loop forever if re
matches an empty string. Is that possible here?
Description
Partially fixes #908 and add tests.
Additional context
Users still cannot freely choose which instance the highlight should be on (regex will choose whatever comes up first).
It would be nice to allow users to manually specify which character (by index) the highlight should be. For example, `js {1[6:12],3-4} to highlight characters 6 through 12 of the first line, and lines 3+4. But that seems like a lot more work. At least this PR will eliminate the error.