Skip to content

More Unicode 17.0 Confusables #1168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

More Unicode 17.0 Confusables #1168

wants to merge 21 commits into from

Conversation

josh-hadley
Copy link
Collaborator

This PR rolls up several individual confusables items:

Each item has 2 commits: one for the confusables-source.txt and a second one containing the generated data from the updated source.

@roozbehp please note there is one commented out that appeared to be a circular reference which caused generating to fail. Please review.
@roozbehp in the PAG ticket, there was an item marked as "remove" (from a generated file) that seems to come from a difficult tangle of other confusables already listed. I was not easily able to work out what that was, so left it. Please review and suggest the line/code in confusables-source.txt that causes this and I can remove and regenerate.
@roozbehp
Copy link
Contributor

Thanks so much!

Something in this PR is causing U+004F O and U+006F o to become confusable with each other, which is not intended or desired. I will investigate, but just a heads-up that it is not OK to merge this as-is.

@macchiati
Copy link
Member

The way the code works, if you introduce an intermediary that matches both of those (eg half-way between), it will form a transitive closure.

@roozbehp
Copy link
Contributor

Something in this PR is causing U+004F O and U+006F o to become confusable with each other, which is not intended or desired. I will investigate, but just a heads-up that it is not OK to merge this as-is.

I think the source of the problem is adding these three lines without removing their previous confusables:

09E6 ; 006F
0B66 ; 006F
0CE6 ; 004F

@josh-hadley, please comment these three lines out for now and regenerate. We'll figure out how to add them properly later.

@josh-hadley
Copy link
Collaborator Author

comment these three lines out for now and regenerate

@roozbehp looks like that did the trick (for eliminating the test failure at least). Please continue reviewing with the updated data.

roozbehp
roozbehp previously approved these changes Jul 21, 2025
Copy link
Contributor

@roozbehp roozbehp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will try to figure out how to properly fix the commented-out lines in the next round.

@roozbehp roozbehp self-requested a review July 22, 2025 06:09
@roozbehp
Copy link
Contributor

I got a new laptop today, so I set up unicodetools and figured out what else needs to change in order to uncomment the new additions. Pushed my changes to the same branch, so they can all land together.

@josh-hadley
Copy link
Collaborator Author

@markusicu do you want to give this a quick 👀 before merge (squash-and-merge)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants