Skip to content

BBC: Blead breaks JSON::XS #23053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eserte opened this issue Mar 2, 2025 · 8 comments · Fixed by #23094
Closed

BBC: Blead breaks JSON::XS #23053

eserte opened this issue Mar 2, 2025 · 8 comments · Fixed by #23094
Labels
BBC Blead Breaks CPAN - changes in blead broke a cpan module(s)

Comments

@eserte
Copy link
Contributor

eserte commented Mar 2, 2025

The testsuite of MLEHMANN/JSON-XS-4.03.tar.gz started to fail on all systems since perl 5.41.9: http://matrix.cpantesters.org/?dist=JSON-XS+4.03

I looked into the t/01_utf8.t failure and it seems that the test code expects that

JSON::XS->new->allow_nonref (1)->utf8 (1)->decode ('"ü"');

fails with an exception, something like

malformed UTF-8 character in JSON string, at character offset 1 (before "\x{fffd}") at t/01_utf8.t line 17.

Since 5.41.9 this exception does not happen anymore.

There are also two failing test cases in t/02_error.t which I did not looked at thoroughly, but I think it's the same issue.

@eserte eserte added BBC Blead Breaks CPAN - changes in blead broke a cpan module(s) Needs Triage labels Mar 2, 2025
@eserte
Copy link
Contributor Author

eserte commented Mar 2, 2025

Another possible victim: CHANSEN/Unicode-UTF8-0.62.tar.gz, see http://matrix.cpantesters.org/?dist=Unicode-UTF8%200.62

It also seem to expect an exception, which does not happen anymore. E.g. in file t/090_non_shortest_form.t there's the following code snippet started to fail:

        throws_ok { 
            encode_utf8($sequence);
        } qr/Can't decode ill-formed UTF-X octet sequence/, $name;

@karenetheridge
Copy link
Member

FWIW, Cpanel::JSON::XS continues to fail on this input as expected.

@eserte
Copy link
Contributor Author

eserte commented Mar 2, 2025

For Unicode-UTF8 a BBC already exists, see #22977

@jkeenan
Copy link
Contributor

jkeenan commented Mar 4, 2025

Bisecting with the following invocation (on Linux unthreaded):

perl Porting/bisect.pl \
--module=JSON::XS \
--start=2c387e63aed4fdb32777f74fdbac6ec6a8e6683b \
--end=8d462f6a9e96adf5dac36c7e778468b78a47be96

... pointed to this breaking commit:

149bea6edf8c49a1faf4fac124567101172d96bd is the first bad commit
commit 149bea6edf8c49a1faf4fac124567101172d96bd
Author: Tony Cook <[email protected]>
Date:   Thu Jul 18 15:39:14 2024 +1000
Commit:     Tony Cook <[email protected]>
CommitDate: Thu Aug 15 10:56:18 2024 +1000

    switch removal: remove the feature from feature.pm

(JSON::XS was mentioned in #22517 (comment), but that post did not fully identify the cause of the breakage of different modules.)

@tonycoz, can you take a look?

@tonycoz
Copy link
Contributor

tonycoz commented Mar 4, 2025

149bea6 was reverted in 9a10079 but it doesn't seem likely to be related - it doesn't touch anything related to ut8.

The recent utf8 decoding churn seems more likely.

@jkeenan
Copy link
Contributor

jkeenan commented Mar 6, 2025

FWIW, Cpanel::JSON::XS continues to fail on this input as expected.

And this poses a major impediment to our BBC handling. Task-CPAN-Reporter has a dependency on Cpanel::JSON::XS. So if I want to submit a CPANtesters report using T-C-R, I'm stymied because it's installation fails during testing of Cpanel::JSON::XS. Examples of those test failures:

PERL_DL_NONLAZY=1 "/home/jkeenan/testing/blead/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00_load.t ................ ok
Malformed UTF-8 character: \xfc\x22 (too short; 2 bytes available, need 6) in subroutine entry at t/01_utf8.t line 12.
Malformed UTF-8 character: \xfc\x22 (unexpected non-continuation byte 0x22, immediately after start byte 0xfc; need 6 bytes, got 1) in subroutine entry at t/01_utf8.t line 12.

#   Failed test at t/01_utf8.t line 13.
#                   'Malformed UTF-8 character (fatal) at t/01_utf8.t line 12.
# '
#     doesn't match '(?^:malformed UTF-8)'

#   Failed test 'ill-formed utf8 <80> throws error'
#   at t/01_utf8.t line 114.
#                   'Malformed UTF-8 character (fatal) at t/01_utf8.t line 111.
# '
#     doesn't match '(?^:malformed UTF-8 character)'

#   Failed test 'ill-formed utf8 <bf> throws error'
#   at t/01_utf8.t line 114.
#                   'Malformed UTF-8 character (fatal) at t/01_utf8.t line 111.
# '
#     doesn't match '(?^:malformed UTF-8 character)'

#   Failed test 'ill-formed utf8 <80 bf> throws error'
#   at t/01_utf8.t line 114.
#                   'Malformed UTF-8 character (fatal) at t/01_utf8.t line 111.
# '
#     doesn't match '(?^:malformed UTF-8 character)'

#   Failed test 'ill-formed utf8 <80 bf 80> throws error'
#   at t/01_utf8.t line 114.
#                   'Malformed UTF-8 character (fatal) at t/01_utf8.t line 111.
# '
#     doesn't match '(?^:malformed UTF-8 character)'

#   Failed test 'ill-formed utf8 <80 bf 80 bf> throws error'
#   at t/01_utf8.t line 114.
#                   'Malformed UTF-8 character (fatal) at t/01_utf8.t line 111.
# '
#     doesn't match '(?^:malformed UTF-8 character)'

#   Failed test 'ill-formed utf8 <80 bf 80 bf 80> throws error'
#   at t/01_utf8.t line 114.
#                   'Malformed UTF-8 character (fatal) at t/01_utf8.t line 111.
# '
#     doesn't match '(?^:malformed UTF-8 character)'

#   Failed test 'ill-formed utf8 <80 bf 80 bf 80 bf> throws error'
#   at t/01_utf8.t line 114.
#                   'Malformed UTF-8 character (fatal) at t/01_utf8.t line 111.
# '
#     doesn't match '(?^:malformed UTF-8 character)'

#   Failed test 'ill-formed utf8 <80 bf 80 bf 80 bf 80> throws error'
#   at t/01_utf8.t line 114.
#                   'Malformed UTF-8 character (fatal) at t/01_utf8.t line 111.
# '
#     doesn't match '(?^:malformed UTF-8 character)'

#   Failed test 'ill-formed utf8 <80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf> throws error'
#   at t/01_utf8.t line 114.
#                   'Malformed UTF-8 character (fatal) at t/01_utf8.t line 111.
# '
#     doesn't match '(?^:malformed UTF-8 character)'

#   Failed test 'ill-formed utf8 <c0 20 c1 20 c2 20 c3 20 c4 20 c5 20 c6 20 c7 20 c8 20 c9 20 ca 20 cb 20 cc 20 cd 20 ce 20 cf 20 d0 20 d1 20 d2 20 d3 20 d4 20 d5 20 d6 20 d7 20 d8 20 d9 20 da 20 db 20 dc 20 dd 20 de 20 df 20> throws error'
#   at t/01_utf8.t line 114.
#                   'Malformed UTF-8 character (fatal) at t/01_utf8.t line 111.
# '
#     doesn't match '(?^:malformed UTF-8 character)'
[ etc. etc.]

@cjg-cguevara
Copy link

I'm able to submit reports again with Task::CPAN::Reporter after:

cpan JSON::PP Test::Reporter::Transport::Metabase

@karenetheridge
Copy link
Member

karenetheridge commented Mar 7, 2025

FWIW, Cpanel::JSON::XS continues to fail on this input as expected.

And this poses a major impediment to our BBC handling.

No, I meant "Cpanel::JSON::XS does not accept this input, which is the expected and correct result."

Cpanel::JSON::XS passes all its tests on 5.41.9 on my system (macos 15.3).

The error string I see in t/01_utf8.t is "malformed UTF-8 character in JSON string, at character offset 1 (before "\x{fffd}"") at t/01_utf8.t line 12."

khwilliamson added a commit to khwilliamson/perl5 that referenced this issue Mar 9, 2025
This reverts commit d62b9fa.
That commit turns out to be ill advised.  I forgot that the function
call it replaced also calls utf8_to_uv() and then munges its return value.
This commit to succeed would have needed the same munging.  But rather
than repeat that logic, just call the original function, by simply
reverting this commit.

This fixes Perl#23053.  I haven't tested them, but it likely also takes care
of Perl#22820, Perl#23086, and Perl#22977
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BBC Blead Breaks CPAN - changes in blead broke a cpan module(s)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants