-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare mktables for Unicode 15.1 and 16.0 #23133
base: blead
Are you sure you want to change the base?
Conversation
if (defined (my $bmg = property_ref('Bidi_Mirroring_Glyph'))) { | ||
$bmg->set_to_output_map($EXTERNAL_MAP); | ||
$bmg->set_range_size_1(1); | ||
} | ||
|
||
property_ref('Numeric_Value')->set_to_output_map($OUTPUT_ADJUSTED); | ||
|
||
# These two properties have no short names and the file names for them | ||
# clash in DOS 8.3. Work around this by creating shorter file names that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we still limited by 8.3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On IRC the other day, I asked if we were still limited, and the answer was yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unicode filenames yes, but for ASCII filenames we don't AFAIK.
@@ -871,6 +871,15 @@ push @tables_that_may_be_empty, 'Grapheme_Cluster_Break=Prepend' | |||
push @tables_that_may_be_empty, 'Canonical_Combining_Class=CCC133' | |||
if $v_version ge v6.2.0; | |||
|
|||
# These properties of Egyptian hieroglyphs are not handled by Perl. Their | |||
# intended audience is only specialist Egyptologists | |||
push @tables_that_may_be_empty, qw(kEH_Cat kEH_Desc kEH_HG kEH_IFAO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it do? And why would we not want to support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it important to get the next Perl version shipped with the latest Unicode release, and I think in order to do this, it has to be in the the upcoming development release due out in the next day or two. Getting this to work in time is lower priority than getting the rest to work in time. These could be legally fixed in the next development release next month. And since the bus factor for getting it in is 1, I don't think the comments should promise anything.
For information, see https://www.unicode.org/reports/tr57/tr57-3.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"are not handled by Perl" is ambiguous. It could be read as "are not to be handled" (so don't add them) or "are not handled yet".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to These properties of Egyptian hieroglyphs are not yet handled by Perl. Their
Add comments, and rewrap comment lines to fit 80 columns
Unicode 15.1 introduces this new property, which needs the same special handling as plain NFKC_Casefold does.
Unicode 15.1 introduces new line breaking rules for Indic languages, via a new property Indic_Conjunct_Break. mktables works in conjunction with regen/mk_invlists.pl to construct tables and DFAs for handling these. This commit prepares mktables to do its part for Unicode versions that have these new rules.
These files are changed in 15.1 to have @missings lines, whereas they didn't before. This leads to some warnings messages, so turn off looking at them, as we do for a number of other files.
Unicode 15.1 changes the rules for line breaking with regards to Quotation marks. This prepares for that.
Unicode 15.1 adds new line breaking rules that depend on the dotted circle. This creates a table for that so that mk_invlists.pl doesn't have to have exception code for handling it.
4894f2a
to
1f07a91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message for aa6faba has 2 misspellings. infrastructue
lacks the second r
. In incoroporated
the second o
needs removal.
This is handled by ignoring it for now, and letting mktables know that the properties it contains are empty. This file, new in 16.0, gives extra information about Egyption Hieroglyphics newly encoded in 16.0. It is intended only for scholars of these ancient symbols. mktables normally handles new properties automatically, but this file is in a completely different format than previous ones, so mktables would have to be adapted to understand that. That might not be too hard, given that mktables has infrastructure to handle other outliers that have come along over the years from Unicode. But, by ignoring this file, we create empty tables which generate errors in other places in perl. These are real bugs that ought to be fixed, and will be before 16.0 is incorporated into blead. And how many Egyptologists are there in the world, much less how many use the latest Perl? So the perldelta will say that 16.0's support doesn't include these, which are mostly provisional anyway.
These new properties are automatically handled, but there is a problem. They have no short form names. Files are written for them based on their names, and those files are not distinguishable on a DOS 8.3 file system. The solution here is to manually override the automatically generated file names with distinguishable ones.
1f07a91
to
de01c61
Compare
perldelta not needed until the actual releases are incorporated.