tokenizeString performance improved by 57% #781

ahfuzhang · 2025-10-28T05:40:18Z

Describe Your Changes

Scan the string only once.
Avoid using isAscii() by checking whether the string contains Unicode.
Use a table lookup instead of a complex Boolean expression.

Checklist

The following checks are mandatory:

My change adheres to VictoriaMetrics contributing guidelines.
My change adheres to VictoriaMetrics development goals.

@valyala
I'm sorry, I don't participate much in open source projects. It seems I'm not following the proper procedures and etiquette enough. I'd appreciate some guidance if I find myself doing things inappropriately.

Could you please take a moment to review my PR? My idol

func25 · 2025-10-29T14:08:04Z

lib/logstorage/hash_tokenizer.go

+		// Search for the end of the token.
+		end := len(s)
+		for i < len(s) {
+			c := *(*byte)(unsafe.Add(ptr, uintptr(i)))
+			found := lookupTables[curUnicodeFlag][c]
+			if found != 0 {
+				i++
+				continue
+			}
+			end = i
+			i++
+			break
+		}


If the token starts with ASCII and later has Unicode bytes, it may break early. Should probably recalculate unicodeFlag per byte here.

fixed.
36.5% faster than old version.

ahfuzhang · 2025-11-03T04:34:44Z

@func25 could you review this again ?

improve performance for token

b5189d5

func25 reviewed Oct 29, 2025

View reviewed changes

ahfuzhang added 2 commits October 30, 2025 08:56

bug fix

3ab5509

bug fix: Unicode strings may begin with a non-Unicode character.

cc9648b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tokenizeString performance improved by 57% #781

tokenizeString performance improved by 57% #781

Uh oh!

ahfuzhang commented Oct 28, 2025 •

edited

Loading

Uh oh!

func25 Oct 29, 2025

Uh oh!

ahfuzhang Oct 30, 2025

Uh oh!

ahfuzhang commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tokenizeString performance improved by 57% #781

Are you sure you want to change the base?

tokenizeString performance improved by 57% #781

Uh oh!

Conversation

ahfuzhang commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe Your Changes

Checklist

Uh oh!

func25 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

ahfuzhang Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

ahfuzhang commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ahfuzhang commented Oct 28, 2025 •

edited

Loading