Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode::UCD Handle properties that have no code points #23134

Open
wants to merge 3 commits into
base: blead
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion charclass_invlists.inc
Original file line number Diff line number Diff line change
Expand Up @@ -436006,7 +436006,7 @@ static const U8 WB_table[23][23] = {
#endif /* defined(PERL_IN_REGEXEC_C) */

/* Generated from:
* 0e8307ab7c654d9c133ea885f5413a4eb5c0123ed2178f7e1cbabed36b67792c lib/Unicode/UCD.pm
* 92b3b0b73e402a9efee67f10380c390638c080fdde7430665e57abdac2fa976f lib/Unicode/UCD.pm
* eb840f36e0a7446293578c684a54c6d83d249abde7bdd4dfa89794af1d7fe9e9 lib/unicore/ArabicShaping.txt
* 333ae1e99db0504ca8a046a07dc45b5e7aa91869c685e6bf955ebe674804827a lib/unicore/BidiBrackets.txt
* b4b9e1d87d8ea273613880de9d2b2f0b0b696244b42152bfa0a3106e7d983a20 lib/unicore/BidiMirroring.txt
Expand Down
24 changes: 15 additions & 9 deletions lib/Unicode/UCD.pm
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use warnings;
no warnings 'surrogate'; # surrogates can be inputs to this
use charnames ();

our $VERSION = '0.79';
our $VERSION = '0.80';

sub DEBUG () { 0 }
$|=1 if DEBUG;
Expand Down Expand Up @@ -3554,8 +3554,8 @@ format is the empty string.

is a combination of the C<"al"> type and the C<"ae"> type. Some of
the map array elements have the forms given by C<"al">, and
the rest are the empty string. The property C<NFKC_Casefold> has this form.
An example slice is:
the rest are the empty string. The properties C<NFKC_Casefold> and
C<NFKC_Simple_Casefold> have this form. An example slice is:

@$ranges_ref @$maps_ref Note
...
Expand Down Expand Up @@ -3846,9 +3846,9 @@ RETRY:
# in the new-style, and this routine is supposed to return old-style block
# names. The Name table is valid, but we need to execute the special code
# below to add in the algorithmic-defined name entries.
# And NFKCCF needs conversion, so handle that here too.
# And NFKCCF NFKCSCF need conversion, so handle those here too.
if (ref $swash eq ""
|| $swash->{'TYPE'} =~ / ^ To (?: Blk | Na | NFKCCF ) $ /x)
|| $swash->{'TYPE'} =~ / ^ To (?: Blk | Na | NFKCS?CF ) \z /x)
{

# Get the short name of the input property, in standard form
Expand Down Expand Up @@ -3993,7 +3993,7 @@ RETRY:
$decomps{'TYPE'} = "ToDt";
$SwashInfo{'ToDt'}{'missing'} = "None";
$SwashInfo{'ToDt'}{'format'} = "s";
} # 'dm' is handled below, with 'nfkccf'
} # 'dm' is handled below, with 'nfkcs?cf'

$decomps{'LIST'} = "";

Expand Down Expand Up @@ -4045,11 +4045,11 @@ RETRY:
}
$swash = \%decomps;
}
elsif ($second_try ne 'nfkccf') { # Don't know this property. Fail.
elsif ($second_try !~ /^nfkcs?cf\z/) { # Don't know this property. Fail.
return;
}

if ($second_try eq 'nfkccf' || $second_try eq 'dm') {
if ($second_try =~ / ^ (?: nfkcs?cf | dm ) \z /x) {

# The 'nfkccf' property is stored in the old format for backwards
# compatibility for any applications that has read its file
Expand Down Expand Up @@ -4180,7 +4180,9 @@ RETRY:
} # End of loop constructing the converted list

# Finish up the data structure for our converted swash
my $type = ($second_try eq 'nfkccf') ? 'ToNFKCCF' : 'ToDm';
my $type = ($second_try =~ / ^ ( nfkcs?cf ) \z /x)
? 'To' . $1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 'To' . uc($1)?

: 'ToDm';
$revised_swash{'LIST'} = $list;
$revised_swash{'TYPE'} = $type;
$revised_swash{'SPECIALS'} = $swash->{'SPECIALS'};
Expand Down Expand Up @@ -4265,6 +4267,10 @@ RETRY:
# assumed to be 'Y'.

foreach my $range (split "\n", $swash->{'LIST'}) {

# No code points matched
last if $range eq '!Unicode::UCD::All';

$range =~ s/ \s* (?: \# .* )? $ //xg; # rmv trailing space, comments

# Find the beginning and end of the range on the line
Expand Down
40 changes: 33 additions & 7 deletions lib/Unicode/UCD.t
Original file line number Diff line number Diff line change
Expand Up @@ -1543,13 +1543,29 @@ foreach my $set_of_tables (\%Unicode::UCD::stricter_to_file_of, \%Unicode::UCD::
chomp $official;
$/ = $input_record_separator;

# If we are to test against an inverted file, it is easier to invert
# our array than the file.
if ($invert) {
if (@tested && $tested[0] == 0) {
shift @tested;
} else {
unshift @tested, 0;

# Special case an inverted empty file
if (@tested == 0) {
if ($official ne 'V0') {
fail_with_diff($mod_table, $official, 'V0',
"prop_invlist");
}
else {
pass("prop_invlist('$mod_table')");
}

next;
}
else {

# If we are to test against an inverted file, it is easier to
# invert our array than the file.
if ($tested[0] == 0) {
shift @tested;
} else {
unshift @tested, 0;
}
}
}

Expand Down Expand Up @@ -1602,6 +1618,7 @@ is(@list, 0, "prop_invmap('Is_Is_Any') returns <undef> since two is's");
# applications use them (though such use is deprecated).
my @legacy_file_format = (qw( Bidi_Mirroring_Glyph
NFKC_Casefold
NFKC_Simple_Casefold
)
);

Expand Down Expand Up @@ -2078,9 +2095,18 @@ foreach my $prop (sort(keys %props)) {
# it's an error
my %specials = %$specials_ref if $specials_ref;

# Special case an expected and gotten empty return
if ( @$invlist_ref - $upper_limit_subtract == 1
&& $official =~ / ^ ( V0 | !Unicode::UCD::All ) \z /x)
{
pass("prop_invmap('$display_prop')");
next PROPERTY;
}

# The extra -$upper_limit_subtract is because the final element may
# have been tested above to be for anything above Unicode, in which
# case the file may not go that high.
# case the file may not go that high. The upper bound may be changed
# in the loop, so can't pre-calculate it.
for (my $i = 0; $i < @$invlist_ref - $upper_limit_subtract; $i++) {

# If the map element is a reference, have to stringify it (but
Expand Down
2 changes: 1 addition & 1 deletion lib/unicore/uni_keywords.pl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion regcharclass.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion regexp_constants.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
#define MAX_FOLD_FROMS 3

/* Generated from:
* 0e8307ab7c654d9c133ea885f5413a4eb5c0123ed2178f7e1cbabed36b67792c lib/Unicode/UCD.pm
* 92b3b0b73e402a9efee67f10380c390638c080fdde7430665e57abdac2fa976f lib/Unicode/UCD.pm
* eb840f36e0a7446293578c684a54c6d83d249abde7bdd4dfa89794af1d7fe9e9 lib/unicore/ArabicShaping.txt
* 333ae1e99db0504ca8a046a07dc45b5e7aa91869c685e6bf955ebe674804827a lib/unicore/BidiBrackets.txt
* b4b9e1d87d8ea273613880de9d2b2f0b0b696244b42152bfa0a3106e7d983a20 lib/unicore/BidiMirroring.txt
Expand Down
2 changes: 1 addition & 1 deletion uni_keywords.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading