-
Notifications
You must be signed in to change notification settings - Fork 3k
Unicode 17 and some more improvements #10382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is an automated commit created by the Maintenance project https://github.com/eksperimental/maintenance Before merging, please read the release notes by visiting <http://www.unicode.org/versions/Unicode17.0.0/> and assess if additional changes are necessary in the code base.
…code-17 * upstream/pr/10181: Update Unicode to version 17.0.0
The specs have been update and are more descriptive and tests updated. Update code generator accordingly.
Export category which is needed for some use-cases, it was available
in lookup but created and unnecessary map.
Also added some needed helpers:
is_other_id_start/1,
is_other_id_continue/1,
is_letter_not_pattern_syntax/1.
To help definition of ID_Start and ID_continue
as described in Unicode spec.
CT Test Results 2 files 97 suites 1h 9m 5s ⏱️ Results for commit 2ea3f5e. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts
// Erlang/OTP Github Action Bot |
Add derivedcoreproperties for testing purposes, will be used when testing is_ID* functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the Unicode standard from version 16.0 to 17.0 and introduces improvements to the code generator that handles Unicode data. The changes include fixes to handle new Unicode 17 data, optimize guard clause generation, and export additional utility functions for internal usage.
Key changes:
- Unicode data files updated from 16.0 to 17.0 with new characters and properties
- Code generator refactored to use custom
is_integer/3macro for range checks instead of verbose guard expressions - New functions exported:
category/1,is_other_id_start/1,is_other_id_continue/1,is_letter_not_pattern_syntax/1
Reviewed Changes
Copilot reviewed 14 out of 18 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| lib/stdlib/uc_spec/gen_unicode_mod.escript | Refactors code generation to use macros for cleaner guard clauses and adds generation of new exported functions |
| lib/stdlib/uc_spec/vendor.info | Adds DerivedCoreProperties.txt to the list of required Unicode data files |
| lib/stdlib/uc_spec/*.txt | Updates Unicode data files from version 16.0 to 17.0 |
| lib/stdlib/test/unicode_util_SUITE.erl | Adds test cases for newly exported functions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Overrides #10181
Adds code generator fixes to handle Unicode 17 (fixes testcases).
Adds and exports some functions for internal usage.