Skip to content

Conversation

@ahfuzhang
Copy link
Contributor

@ahfuzhang ahfuzhang commented Oct 28, 2025

Describe Your Changes

  1. Scan the string only once.
  2. Avoid using isAscii() by checking whether the string contains Unicode.
  3. Use a table lookup instead of a complex Boolean expression.

Checklist

The following checks are mandatory:

@valyala
I'm sorry, I don't participate much in open source projects. It seems I'm not following the proper procedures and etiquette enough. I'd appreciate some guidance if I find myself doing things inappropriately.

Could you please take a moment to review my PR? My idol

Comment on lines 199 to 211
// Search for the end of the token.
end := len(s)
for i < len(s) {
c := *(*byte)(unsafe.Add(ptr, uintptr(i)))
found := lookupTables[curUnicodeFlag][c]
if found != 0 {
i++
continue
}
end = i
i++
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the token starts with ASCII and later has Unicode bytes, it may break early. Should probably recalculate unicodeFlag per byte here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.
36.5% faster than old version.

@ahfuzhang
Copy link
Contributor Author

@func25 could you review this again ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants