-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
feat: Add Intl.Segmenter support for Textbox word splitting #10791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Review or Edit in CodeSandboxOpen the branch in Web Editor • VS Code • Insiders |
|
Build Stats
|
|
is this duplicate of #10606? |
|
Mostly yes, although I'm not changing the fallback regex or wrapping behaviour |
|
the wrapping changes were what made me push that other pr back in the backlog because i do not want to fiddle with them. |
|
|
||
| // Fallback to regex-based split | ||
| return textstring.split(splitRegex); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may know, how this would work with languages that do not use spaces to divide words?
We have this naive approach in which we split by wordJoiners and then we assume a space was there.
the Intl.segmenter goes beyond that and knows how to split words that have no spaces, but then we are going to put a space back when we render text.
Did you encounter this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you encounter this issue?
I think so, but it was an year ago and with all the code that we have on top of Fabric I can't really tell anymore whether it was fabric or us.
I can only copy over the explanation I wrote at that time in the PR:
I found an issue with text wrapping in PROD, that removes spaces when wrapping. You can notice this by typing "Hello world" in a textbox and changing the width so that it wraps the 2nd word. You'll notice that the space disappears. This was not a big deal previously, but now it's more important because [...redacted].
An additional bug in PROD is that the removed space is still there in the hidden textarea, so if you press RightArrowKey at the end of the first line, it will do nothing and you'll have to press twice in order to actually move the cursor to the next char.
To solve this I used correct word splitting with Intl.Segmenter, which also improves splitting for non-latin languages.
Sorry I'd like to be more helpful but I realise indeed this kind of stuff is tricky and better to change it until you face the problem. Feel free to close the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may know, how this would work with languages that do not use spaces to divide words?
I've never tried actually with non-latin languages. I think emojies could probably be the most common case where Segmenter shined compared to a naive regex approach. You can for instance have several emojies without text in the between. The segmenter will correctly return each emoji as word, whereas the regex will return them as single word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is ok every element that point to the direction that wrapping needs to be rewritten and with it text, it moves un in that direction.
The problem is here also with very normal language. I do not think for the word splitter 'apple,banana' is 3 words. is still 2. but there is no space between. And either case, we would render it bad with the space trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think for the word splitter 'apple,banana' is 3 words
The Segmenter will also return the isWordLike value if granularity is word to be able to distinguish punctualisation from words if needed, an additional benefit of Segmenter
|
Also hi @jiayihu |
Description
I saw that support for
Intl.Segmenterwas added in #10595 so I thought I could copy over our same implementation forwordSplit