Skip to content

Fix #6727 #7042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

saimeunt
Copy link
Contributor

@saimeunt saimeunt commented Mar 26, 2025

Description

This PR fixes a bug where hex literals have their optional suffix counted as part of their digits.
The root cause of the bug it that the lexer wrongfully interpret the suffix as additional hex digits as "0xb256" happens to be a valid hex string.
This won't occur with other numeric literal constants such as "0x0u256" as the lexer will stop upon encountering "u" which is not a valid hex digit, the lexer will then extract the type suffix to help decoding the numeric literal which the suffix is stripped from before being further processed.
As far as binary b256 literals are concerned, the suffixed version will correctly stop interpreting characters as bits until the suffix is found, but it will fail with a lexer "InvalidIntSuffix" error because b256 is an unknown suffix in the first place (see https://github.com/FuelLabs/sway/blob/master/sway-parse/src/token.rs#L750-L764).
The suffix-free version is not impacted and correctly compiles (already was before this fix).

The initial proposed fix is to patch the literal_to_literal function which is responsible for transforming a sway_ast literal into a sway language literal: the suffix is removed from the hex literal only if it's preceded by 64 hex digits to form a correct b256 literal. The hex digits are parsed again because we can't use the parsed BigUint provided by the lexer as it may include the optional suffix wrongfully interpreted as "0xb256".

This fix feels like a band-aid and a deeper fix should be implemented at the lexer level to properly parse b256 literals.

Here's a test summarizing the bug and validating the fix:

#[test]
fn test_b256_literal_suffix() {
    // 64 zeros followed by the b256 suffix
    // the lexer will wrongfully interpret this as a 68 long hex string with the suffix adding 2 extra bytes
    // in the literal_to_literal function we have to strip the suffix and reparse it to get the correct b256
    let foo = 0x0000000000000000000000000000000000000000000000000000000000000000b256;
    assert(foo == b256::zero());
    // 60 zeroes followed by 4 hex digits "b256"
    // the trailing b256 is not a suffix but the last 2 bytes of an hex literal ending with "0xb256": it must be 
    // preserved
    let bar = 0x000000000000000000000000000000000000000000000000000000000000b256;
    assert(bar.as_u256() == 0xb256);
    // this will not compile at the moment because the lexer will throw an InvalidIntSuffix error as b256 is not a
    // known suffix
    // let foo_bin = 0b0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000b256;
    // assert(foo_bin == b256::zero());
    // this is parsed correctly
    let bar_bin = 0b0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001011001001010110;
    assert(bar_bin.as_u256() == 0xb256);
}

Closes #6727

Checklist

  • I have linked to any relevant issues.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation where relevant (API docs, the reference, and the Sway book).
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added (or requested a maintainer to add) the necessary Breaking* or New Feature labels where relevant.
  • I have done my best to ensure that my PR adheres to the Fuel Labs Code Review Standards.
  • I have requested a review from the relevant team or maintainers.

@saimeunt saimeunt requested review from a team as code owners March 26, 2025 00:06
@fuel-cla-bot
Copy link

fuel-cla-bot bot commented Mar 26, 2025

Thanks for the contribution! Before we can merge this, we need @saimeunt to sign the Fuel Labs Contributor License Agreement.

@IGI-111
Copy link
Contributor

IGI-111 commented Mar 26, 2025

That test should be added to the codebase.

Copy link

codspeed-hq bot commented Mar 26, 2025

CodSpeed Performance Report

Merging #7042 will not alter performance

Comparing saimeunt:fix/sway-hex-literals-b256-suffix-6727 (39c8ff3) with master (df50ca8)

Summary

✅ 22 untouched benchmarks

Copy link
Contributor

@IGI-111 IGI-111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole logic seems fine but this is breaking separation of concerns.

This kind of handling should happen in sway-parse if at all possible not in the parse tree conversion.

@saimeunt saimeunt requested a review from a team as a code owner March 27, 2025 13:48
@saimeunt
Copy link
Contributor Author

@IGI-111 Thanks for your review, it helped me getting in the right direction.

I removed my fix from the literal_to_literal function and fixed the parser to correctly handle b256 literals having their optional suffix set.

The fix in the parser is to introduce a new variant of the LitIntType to support the b256 prefix.
https://github.com/FuelLabs/sway/pull/7042/files#diff-2f9f0abeb9bc800234c4324920a706eb238b46e0e33de7752f64a0f9f9360d8cR101

Then in the parge_digits function we introduce a limit over the maximum digits that should be parsed (eg. no more than 64 hex digits should be parsed when radix is 16) so we don't end up mistaking the b256 prefix for overflowing hex data.
https://github.com/FuelLabs/sway/pull/7042/files#diff-2f9f0abeb9bc800234c4324920a706eb238b46e0e33de7752f64a0f9f9360d8cR101

I have added a test validating this fix and made sure all other tests are passing.

Copy link
Contributor

@IGI-111 IGI-111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the work on this, but considering the logic further here, using the existing integer facilities of the parser may not actually be the best way to get good behavior here.

What the original issue complains about is that the following:
0x0000000000000000000000000000000000000000000000000000000000000000b256
is not accepted despite being unambiguous (since it's 64 bytes followed by "b256").

However, just accepting any integer literal with a hex suffix isn't acceptable because this:

0x01b256 is inherently ambiguous, the compiler can't know if you intended a shortform hex literal for 00 b2 56 or if you meant 01 as a b256 type. Given the critical uses of b256 types in Sway logic, that's unacceptable.

Hence the original introduction of the rule that all b256 literals should be longform, since that makes them unambiguous.

I think what is probably best here to keep this property and change the special handling of the parser for b256 types to strip a potential, unambiguous, suffix; as was suggested in the original issue.

You're on the right track in the sense that you're looking at the right parts of the code, but this still needs some changes.

@@ -0,0 +1,19 @@
script;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing is that this test should have more diverse sets of values to make sure that things like endianess works correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sway hex literals have their suffix counted as digits
2 participants