Skip to content

Commit 4894f2a

Browse files
committed
mktables: Handle Unicode 16.0 new \d ranges
mktables does a lot of sanity checks on the data it gets fed. One of those is to make sure any \d group of code points is 10 long. This verifies that Unicode has given us enough code points to form 0-9. It assumes that if it got this much right, that their numeric values are also 0-9. This check has uncovered issues with the Unicode Standard in the past. Nowadays, they've cleaned up their act, and it's been many releases since there has been problems. But our checks remain, and I think they should. What happens in Unicode 16.0 was there was a range of \d characters that contain two consecutive groups of 0-9 values. The check could be changed to verify that the count is divisible by 10, but checking for this particular range is a bit safer.
1 parent 688f6ba commit 4894f2a

6 files changed

+9
-5
lines changed

charclass_invlists.inc

+1-1
Original file line numberDiff line numberDiff line change
@@ -436055,7 +436055,7 @@ static const U8 WB_table[23][23] = {
436055436055
* 3f4f32ed2a577344a508114527e721d7a8b633d32f38945d47fe0c743650c585 lib/unicore/extracted/DLineBreak.txt
436056436056
* 710abf2d581ac9c57f244c0834f9d9969d9781e0396adccd330eaae658ac7d6b lib/unicore/extracted/DNumType.txt
436057436057
* 6bd30f385f3baf3ab5d5308c111a81de87bea5f494ba0ba69e8ab45263b8c34d lib/unicore/extracted/DNumValues.txt
436058-
* 2851ec4057abad0019e802bee35d17b13a95b75f7b72651edd27c6e31d527fac lib/unicore/mktables
436058+
* ee8db4095bbf47197cb3460d73ed5c1a9532d2363bedbf6e2ba1a727395a7f1f lib/unicore/mktables
436059436059
* 55d90fdc3f902e5c0b16b3378f9eaa36e970a1c09723c33de7d47d0370044012 lib/unicore/version
436060436060
* 0a6b5ab33bb1026531f816efe81aea1a8ffcd34a27cbea37dd6a70a63d73c844 regen/charset_translations.pl
436061436061
* c7ff8e0d207d3538c7feb4a1a152b159e5e902d20293b303569ea8323e84633e regen/mk_PL_charclass.pl

lib/unicore/mktables

+4
Original file line numberDiff line numberDiff line change
@@ -13756,6 +13756,10 @@ END
1375613756
next if $range->start == 0x1D7CE; # This whole range was added in 3.1
1375713757
next if $range->end == 0x19DA && $v_version eq v5.2.0;
1375813758
next if $range->end - $range->start < 9 && $v_version le 4.0.0;
13759+
13760+
# 2 sequential series of 10 each were added in 16.0
13761+
next if $range->start == 0x116D0 && $range->end == 0x116E3;
13762+
1375913763
Carp::my_carp("Range $range unexpectedly doesn't contain 10"
1376013764
. " decimal digits. Code in regcomp.c assumes it does,"
1376113765
. " and will have to be fixed. Proceeding anyway.");

lib/unicore/uni_keywords.pl

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

regcharclass.h

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

regexp_constants.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@
7878
* 3f4f32ed2a577344a508114527e721d7a8b633d32f38945d47fe0c743650c585 lib/unicore/extracted/DLineBreak.txt
7979
* 710abf2d581ac9c57f244c0834f9d9969d9781e0396adccd330eaae658ac7d6b lib/unicore/extracted/DNumType.txt
8080
* 6bd30f385f3baf3ab5d5308c111a81de87bea5f494ba0ba69e8ab45263b8c34d lib/unicore/extracted/DNumValues.txt
81-
* 2851ec4057abad0019e802bee35d17b13a95b75f7b72651edd27c6e31d527fac lib/unicore/mktables
81+
* ee8db4095bbf47197cb3460d73ed5c1a9532d2363bedbf6e2ba1a727395a7f1f lib/unicore/mktables
8282
* 55d90fdc3f902e5c0b16b3378f9eaa36e970a1c09723c33de7d47d0370044012 lib/unicore/version
8383
* 0a6b5ab33bb1026531f816efe81aea1a8ffcd34a27cbea37dd6a70a63d73c844 regen/charset_translations.pl
8484
* c7ff8e0d207d3538c7feb4a1a152b159e5e902d20293b303569ea8323e84633e regen/mk_PL_charclass.pl

uni_keywords.h

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)