Skip to content

Panic: WTF-8 character boundary error when source contains multibyte (CJK) string literals #13105

@Taoister39

Description

@Taoister39

Bug Report

System Info

  System:
    OS: macOS 26.2
    CPU: (10) arm64 Apple M4
    Memory: 462.08 MB / 32.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 22.17.1 - /Users/xxx/.nvm/versions/node/v22.17.1/bin/node
    Yarn: 1.22.22 - /Users/xxx/.nvm/versions/node/v22.17.1/bin/yarn
    npm: 10.9.2 - /Users/xxx/.nvm/versions/node/v22.17.1/bin/npm
    pnpm: 10.13.1 - /Users/xxx/.nvm/versions/node/v22.17.1/bin/pnpm
    bun: 1.2.19 - /opt/homebrew/bin/bun
  Browsers:
    Chrome: 144.0.7559.133
    Safari: 26.2

Description

Rspack panics with a SIGABRT when building a project whose source files contain string literals with multibyte (CJK / Chinese) characters. The panic originates in hstr's WTF-8 implementation where a byte offset that is not on a character boundary is used to slice the string.

Panic output

Panic occurred at runtime. Please file an issue on GitHub with the backtrace below:
https://github.com/web-infra-dev/rspack/issues

panicked at index.crates.io-1949cf8c6b5b557f/hstr-3.0.3/src/wtf8/not_quite_std.rs:173:5:
index 0 and/or 14 in "## 环境信息\n+ 版本号:\n+ 账号:\n..." do not lie on character boundary

The string "## 环境信息" has the following byte layout (UTF-8):

# # (space) 环(3B) 境(3B) 信(3B) 息(3B)
0 1  2      3-5    6-8    9-11   12-14

Byte offset 14 is the last byte of (a 3-byte character), not a character boundary — causing Rust to panic when slicing.

Reproduction

Any TypeScript/JavaScript source file containing a CJK string literal, e.g.:

// src/index.ts
const template = `## 环境信息
+ 版本号:
+ 账号:`;

Build it with rspack (directly or via rslib/rsbuild). The build will crash with SIGABRT.

Environment

Version
@rspack/core ~1.7.5
hstr 3.0.3
OS macOS (Apple Silicon)

Root cause hypothesis

hstr 3.0.3 introduced or changed WTF-8 string interning. Somewhere in the string-processing pipeline (likely during module parsing or stats generation), a byte offset derived from a JS code unit index or a regex match is used directly to slice a WTF-8 JsWord / Atom. For ASCII-only strings this accidentally works, but for multibyte characters the offset is not a valid UTF-8 boundary, triggering the panic.

Workaround

Splitting the problematic string constant out of .ts source into a separate JSON/text resource file (loaded at runtime) avoids the panic, since rspack does not parse the string content of non-JS assets the same way.

Additional context

Reported via rslib issue reproduction. The panic is fully deterministic and reproducible on every build attempt.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions