Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare mktables for Unicode 15.1 and 16.0 #23133

Open
wants to merge 8 commits into
base: blead
Choose a base branch
from

Conversation

khwilliamson
Copy link
Contributor

perldelta not needed until the actual releases are incorporated.

  • This set of changes does not require a perldelta entry.

if (defined (my $bmg = property_ref('Bidi_Mirroring_Glyph'))) {
$bmg->set_to_output_map($EXTERNAL_MAP);
$bmg->set_range_size_1(1);
}

property_ref('Numeric_Value')->set_to_output_map($OUTPUT_ADJUSTED);

# These two properties have no short names and the file names for them
# clash in DOS 8.3. Work around this by creating shorter file names that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we still limited by 8.3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On IRC the other day, I asked if we were still limited, and the answer was yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unicode filenames yes, but for ASCII filenames we don't AFAIK.

@@ -871,6 +871,15 @@ push @tables_that_may_be_empty, 'Grapheme_Cluster_Break=Prepend'
push @tables_that_may_be_empty, 'Canonical_Combining_Class=CCC133'
if $v_version ge v6.2.0;

# These properties of Egyptian hieroglyphs are not handled by Perl. Their
# intended audience is only specialist Egyptologists
push @tables_that_may_be_empty, qw(kEH_Cat kEH_Desc kEH_HG kEH_IFAO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it do? And why would we not want to support it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it important to get the next Perl version shipped with the latest Unicode release, and I think in order to do this, it has to be in the the upcoming development release due out in the next day or two. Getting this to work in time is lower priority than getting the rest to work in time. These could be legally fixed in the next development release next month. And since the bus factor for getting it in is 1, I don't think the comments should promise anything.

For information, see https://www.unicode.org/reports/tr57/tr57-3.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"are not handled by Perl" is ambiguous. It could be read as "are not to be handled" (so don't add them) or "are not handled yet".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to These properties of Egyptian hieroglyphs are not yet handled by Perl. Their

Add comments, and rewrap comment lines to fit 80 columns
Unicode 15.1 introduces this new property, which needs the same special
handling as plain NFKC_Casefold does.
Unicode 15.1 introduces new line breaking rules for Indic languages, via
a new property Indic_Conjunct_Break.  mktables works in conjunction with
regen/mk_invlists.pl to construct tables and DFAs for handling these.
This commit prepares mktables to do its part for Unicode versions that
have these new rules.
These files are changed in 15.1 to have @missings lines, whereas they
didn't before.  This leads to some warnings messages, so turn off
looking at them, as we do for a number of other files.
Unicode 15.1 changes the rules for line breaking with regards to
Quotation marks.  This prepares for that.
Unicode 15.1 adds new line breaking rules that depend on the dotted
circle.  This creates a table for that so that mk_invlists.pl doesn't
have to have exception code for handling it.
Copy link
Contributor

@jkeenan jkeenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message for aa6faba has 2 misspellings. infrastructue lacks the second r. In incoroporated the second o needs removal.

This is handled by ignoring it for now, and letting mktables know that
the properties it contains are empty.  This file, new in 16.0, gives
extra information about Egyption Hieroglyphics newly encoded in 16.0.
It is intended only for scholars of these ancient symbols.

mktables normally handles new properties automatically, but this file is
in a completely different format than previous ones, so mktables would
have to be adapted to understand that.  That might not be too hard,
given that mktables has infrastructure to handle other outliers that have
come along over the years from Unicode.  But, by ignoring this file, we
create empty tables which generate errors in other places in perl.
These are real bugs that ought to be fixed, and will be before 16.0 is
incorporated into blead.  And how many Egyptologists are there in the
world, much less how many use the latest Perl?

So the perldelta will say that 16.0's support doesn't include these,
which are mostly provisional anyway.
These new properties are automatically handled, but there is a problem.
They have no short form names.  Files are written for them based on
their names, and those files are not distinguishable on a DOS 8.3 file
system.  The solution here is to manually override the automatically
generated file names with distinguishable ones.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants