Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## [next]

- feat: Add Intl.Segmenter support for Textbox word splitting [#10791](https://github.com/fabricjs/fabric.js/pull/10791)
- chore(): update major version of vitest [#10786](https://github.com/fabricjs/fabric.js/pull/10786)
- fix(): Prototype pollution risk on text char cache [#10782](https://github.com/fabricjs/fabric.js/pull/10782)
- chore(): update playwright [#10780](https://github.com/fabricjs/fabric.js/pull/10780)
Expand Down
3 changes: 2 additions & 1 deletion src/shapes/Textbox.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import type { SerializedITextProps, ITextProps } from './IText/IText';
import type { ITextEvents } from './IText/ITextBehavior';
import type { TextLinesInfo } from './Text/Text';
import type { Control } from '../controls/Control';
import { wordSplit } from '../util/lang_string';

// @TODO: Many things here are configuration related and shouldn't be on the class nor prototype
// regexes, list of properties that are not suppose to change by instances, magic consts.
Expand Down Expand Up @@ -408,7 +409,7 @@ export class Textbox<
* @returns {string[]} array of words
*/
wordSplit(value: string): string[] {
return value.split(this._wordJoiners);
return wordSplit(value, this._wordJoiners);
}

/**
Expand Down
49 changes: 41 additions & 8 deletions src/util/lang_string.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,31 @@ export const escapeXml = (string: string): string =>
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;');

let segmenter: Intl.Segmenter | false;
let graphemeSegmenter: Intl.Segmenter | false;
let wordSegmenter: Intl.Segmenter | false;

const getSegmenter = () => {
if (!segmenter) {
segmenter =
const getGraphemeSegmenter = () => {
if (!graphemeSegmenter) {
graphemeSegmenter =
'Intl' in getFabricWindow() &&
'Segmenter' in Intl &&
new Intl.Segmenter(undefined, {
granularity: 'grapheme',
});
}
return segmenter;
return graphemeSegmenter;
};

const getWordSegmenter = () => {
if (!wordSegmenter) {
wordSegmenter =
'Intl' in getFabricWindow() &&
'Segmenter' in Intl &&
new Intl.Segmenter(undefined, {
granularity: 'word',
});
}
return wordSegmenter;
};

/**
Expand All @@ -46,16 +59,36 @@ const getSegmenter = () => {
* @return {Array} array containing the graphemes
*/
export const graphemeSplit = (textstring: string): string[] => {
segmenter || getSegmenter();
if (segmenter) {
const segments = segmenter.segment(textstring);
graphemeSegmenter || getGraphemeSegmenter();
if (graphemeSegmenter) {
const segments = graphemeSegmenter.segment(textstring);
return Array.from(segments).map(({ segment }) => segment);
}

//Fallback
return graphemeSplitImpl(textstring);
};

/**
* Divide a string into words
* @param {String} textstring String to split into words
* @param {RegExp} splitRegex Optional regex pattern for fallback splitting (default: /[ \t\r]/)
* @return {Array} array containing the words
*/
export const wordSplit = (
textstring: string,
splitRegex: RegExp
): string[] => {
wordSegmenter || getWordSegmenter();
if (wordSegmenter) {
const segments = wordSegmenter.segment(textstring);
return Array.from(segments).map(({ segment }) => segment);
}

// Fallback to regex-based split
return textstring.split(splitRegex);
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may know, how this would work with languages that do not use spaces to divide words?
We have this naive approach in which we split by wordJoiners and then we assume a space was there.
the Intl.segmenter goes beyond that and knows how to split words that have no spaces, but then we are going to put a space back when we render text.

Did you encounter this issue?

Copy link
Contributor Author

@jiayihu jiayihu Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you encounter this issue?

I think so, but it was an year ago and with all the code that we have on top of Fabric I can't really tell anymore whether it was fabric or us.

I can only copy over the explanation I wrote at that time in the PR:

I found an issue with text wrapping in PROD, that removes spaces when wrapping. You can notice this by typing "Hello world" in a textbox and changing the width so that it wraps the 2nd word. You'll notice that the space disappears. This was not a big deal previously, but now it's more important because [...redacted].

An additional bug in PROD is that the removed space is still there in the hidden textarea, so if you press RightArrowKey at the end of the first line, it will do nothing and you'll have to press twice in order to actually move the cursor to the next char.

To solve this I used correct word splitting with Intl.Segmenter, which also improves splitting for non-latin languages.

Sorry I'd like to be more helpful but I realise indeed this kind of stuff is tricky and better to change it until you face the problem. Feel free to close the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may know, how this would work with languages that do not use spaces to divide words?

I've never tried actually with non-latin languages. I think emojies could probably be the most common case where Segmenter shined compared to a naive regex approach. You can for instance have several emojies without text in the between. The segmenter will correctly return each emoji as word, whereas the regex will return them as single word.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok every element that point to the direction that wrapping needs to be rewritten and with it text, it moves un in that direction.

The problem is here also with very normal language. I do not think for the word splitter 'apple,banana' is 3 words. is still 2. but there is no space between. And either case, we would render it bad with the space trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think for the word splitter 'apple,banana' is 3 words

The Segmenter will also return the isWordLike value if granularity is word to be able to distinguish punctualisation from words if needed, an additional benefit of Segmenter


const graphemeSplitImpl = (textstring: string): string[] => {
const graphemes: string[] = [];
for (let i = 0, chr; i < textstring.length; i++) {
Expand Down