Skip to content

Conversation

@jiayihu
Copy link
Contributor

@jiayihu jiayihu commented Oct 29, 2025

Description

I saw that support for Intl.Segmenter was added in #10595 so I thought I could copy over our same implementation for wordSplit

@codesandbox
Copy link

codesandbox bot commented Oct 29, 2025

Review or Edit in CodeSandbox

Open the branch in Web EditorVS CodeInsiders

Open Preview

@github-actions
Copy link
Contributor

Build Stats

file / KB (diff) bundled minified
fabric 917.715 (+0.935) 301.930 (+0.219)

@Smrtnyk
Copy link
Contributor

Smrtnyk commented Oct 29, 2025

is this duplicate of #10606?

@jiayihu
Copy link
Contributor Author

jiayihu commented Oct 29, 2025

Mostly yes, although I'm not changing the fallback regex or wrapping behaviour

@asturur
Copy link
Member

asturur commented Oct 30, 2025

the wrapping changes were what made me push that other pr back in the backlog because i do not want to fiddle with them.


// Fallback to regex-based split
return textstring.split(splitRegex);
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may know, how this would work with languages that do not use spaces to divide words?
We have this naive approach in which we split by wordJoiners and then we assume a space was there.
the Intl.segmenter goes beyond that and knows how to split words that have no spaces, but then we are going to put a space back when we render text.

Did you encounter this issue?

Copy link
Contributor Author

@jiayihu jiayihu Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you encounter this issue?

I think so, but it was an year ago and with all the code that we have on top of Fabric I can't really tell anymore whether it was fabric or us.

I can only copy over the explanation I wrote at that time in the PR:

I found an issue with text wrapping in PROD, that removes spaces when wrapping. You can notice this by typing "Hello world" in a textbox and changing the width so that it wraps the 2nd word. You'll notice that the space disappears. This was not a big deal previously, but now it's more important because [...redacted].

An additional bug in PROD is that the removed space is still there in the hidden textarea, so if you press RightArrowKey at the end of the first line, it will do nothing and you'll have to press twice in order to actually move the cursor to the next char.

To solve this I used correct word splitting with Intl.Segmenter, which also improves splitting for non-latin languages.

Sorry I'd like to be more helpful but I realise indeed this kind of stuff is tricky and better to change it until you face the problem. Feel free to close the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may know, how this would work with languages that do not use spaces to divide words?

I've never tried actually with non-latin languages. I think emojies could probably be the most common case where Segmenter shined compared to a naive regex approach. You can for instance have several emojies without text in the between. The segmenter will correctly return each emoji as word, whereas the regex will return them as single word.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok every element that point to the direction that wrapping needs to be rewritten and with it text, it moves un in that direction.

The problem is here also with very normal language. I do not think for the word splitter 'apple,banana' is 3 words. is still 2. but there is no space between. And either case, we would render it bad with the space trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think for the word splitter 'apple,banana' is 3 words

The Segmenter will also return the isWordLike value if granularity is word to be able to distinguish punctualisation from words if needed, an additional benefit of Segmenter

@asturur
Copy link
Member

asturur commented Oct 30, 2025

Also hi @jiayihu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants