Skip to content

Conversation

hardfist
Copy link
Contributor

@hardfist hardfist commented Sep 4, 2025

Summary

There's a subtle difference between old lexer and new lexer,

  • the old lexer will carry context which makes itself can using lookahead to deal with ambiguous syntax(but it's still not 100% right)
  • new lexer will carry less context which makes it faster but also make it's hard to deal with ambiguous syntax out of parser right.

That's why we met #11555 when migrate from old lexer to new lexer.

The key problem is that the new lexer can not be used out of parser now(but it's good for performance), so we need to collect tokens during parse phase other than use the old lexer(which not only slow but also increase code size).

we can easily tell the difference between lexer + collect and lexer +paser in following example:

/a/g
  • lexer + collect
 [
    TokenAndSpan {
        token: /,
        had_line_break: true,
        span: 1..2,
    },
    TokenAndSpan {
        token: a,
        had_line_break: false,
        span: 2..3,
    },
    TokenAndSpan {
        token: /,
        had_line_break: false,
        span: 3..4,
    },
    TokenAndSpan {
        token: g,
        had_line_break: false,
        span: 4..5,
    },
    TokenAndSpan {
        token: ;,
        had_line_break: false,
        span: 5..6,
    },
]
  • lexer + parser
 [
    TokenAndSpan {
        token: /,
        had_line_break: true,
        span: 1..2,
    },
    TokenAndSpan {
        token: regexp literal (a, g),
        had_line_break: true,
        span: 2..5,
    },
    TokenAndSpan {
        token: ;,
        had_line_break: false,
        span: 5..6,
    },
]

Let's see some real world case to show the different behaviors of different lexers.
code:

`${a}b`;
  • old lexer + collect
[
    TokenAndSpan {
        token: `,
        had_line_break: true,
        span: 1..2,
    },
    TokenAndSpan {
        token: template token (),
        had_line_break: false,
        span: 2..2,
    },
    TokenAndSpan {
        token: ${,
        had_line_break: false,
        span: 2..4,
    },
    TokenAndSpan {
        token: a,
        had_line_break: false,
        span: 4..5,
    },
    TokenAndSpan {
        token: },
        had_line_break: false,
        span: 5..6,
    },
    TokenAndSpan {
        token: template token (b),
        had_line_break: false,
        span: 6..7,
    },
    TokenAndSpan {
        token: `,
        had_line_break: false,
        span: 7..8,
    },
    TokenAndSpan {
        token: ;,
        had_line_break: false,
        span: 8..9,
    },
]
  • old lexer + parser
[
    TokenAndSpan {
        token: `,
        had_line_break: true,
        span: 1..2,
    },
    TokenAndSpan {
        token: ${,
        had_line_break: false,
        span: 2..4,
    },
    TokenAndSpan {
        token: a,
        had_line_break: false,
        span: 4..5,
    },
    TokenAndSpan {
        token: },
        had_line_break: false,
        span: 5..6,
    },
    TokenAndSpan {
        token: template token (b),
        had_line_break: false,
        span: 6..7,
    },
    TokenAndSpan {
        token: `,
        had_line_break: false,
        span: 7..8,
    },
    TokenAndSpan {
        token: ;,
        had_line_break: false,
        span: 8..9,
    },
]
  • new lexer + collect(which will generate TokenError)
[
    TokenAndSpan {
        token: None,
        had_line_break: true,
        span: 1..4,
    },
    TokenAndSpan {
        token: <identifier>,
        had_line_break: false,
        span: 4..5,
    },
    TokenAndSpan {
        token: },
        had_line_break: false,
        span: 5..6,
    },
    TokenAndSpan {
        token: <identifier>,
        had_line_break: false,
        span: 6..7,
    },
    TokenAndSpan {
        token: token:error,
        had_line_break: false,
        span: 7..9,
    },
]
  • new lexer + parser
[
    TokenAndSpan {
        token: <template head>,
        had_line_break: true,
        span: 1..4,
    },
    TokenAndSpan {
        token: <identifier>,
        had_line_break: false,
        span: 4..5,
    },
    TokenAndSpan {
        token: },
        had_line_break: false,
        span: 5..6,
    },
    TokenAndSpan {
        token: ;,
        had_line_break: false,
        span: 8..9,
    },
]

Follow up

  • remove collect support for new lexer since it doesn't make sense now
  • collect asi directly using a asicapturing to avoid collect tokens

Related links

related to #11555 microsoft/typescript-go#1554

Checklist

  • Tests updated (or not required).
  • Documentation updated (or not required).

Copy link

netlify bot commented Sep 4, 2025

Deploy Preview for rspack canceled.

Name Link
🔨 Latest commit e9f7125
🔍 Latest deploy log https://app.netlify.com/projects/rspack/deploys/68c26684799d4a00087a86c4

@github-actions github-actions bot added release: bug fix release: bug related release(mr only) team The issue/pr is created by the member of Rspack. labels Sep 4, 2025
Copy link
Contributor

github-actions bot commented Sep 4, 2025

📦 Binary Size-limit

Comparing e9f7125 to fix: persistent cache update issuer is updated modules (#11633) by jinrui

❌ Size increased by 206.50KB from 47.23MB to 47.43MB (⬆️0.43%)

Copy link

codspeed-hq bot commented Sep 4, 2025

CodSpeed Performance Report

Merging #11577 will not alter performance

Comparing yj/fix-token (e9f7125) with main (d005492)1

🎉 Hooray! codspeed-node just leveled up to 4.0.1!

A heads-up, this is a breaking change and it might affect your current performance baseline a bit. But here's the exciting part - it's packed with new, cool features and promises improved result stability 🥳!
Curious about what's new? Visit our releases page to delve into all the awesome details about this new version.

Summary

✅ 17 untouched benchmarks

Footnotes

  1. No successful run was found on main (195ce01) during the generation of this report, so d005492 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Copy link
Contributor

github-actions bot commented Sep 4, 2025

📝 Benchmark detail: Open

Name Base (2025-09-04 37589c4) Current Change
10000_big_production-mode_disable-minimize + exec 29.1 s ± 568 ms 28.5 s ± 1.28 s -2.30 %
10000_development-mode + exec 1.65 s ± 22 ms 1.64 s ± 39 ms -0.94 %
10000_development-mode_hmr + exec 688 ms ± 5 ms 693 ms ± 24 ms +0.76 %
10000_development-mode_noop-loader + exec 2.59 s ± 74 ms 2.59 s ± 54 ms +0.15 %
10000_production-mode + exec 1.69 s ± 63 ms 1.71 s ± 25 ms +0.83 %
10000_production-mode_persistent-cold + exec 1.87 s ± 77 ms 1.87 s ± 36 ms +0.12 %
10000_production-mode_persistent-hot + exec 1.35 s ± 58 ms 1.37 s ± 41 ms +1.14 %
arco-pro_development-mode + exec 1.76 s ± 95 ms 1.75 s ± 98 ms -0.50 %
arco-pro_development-mode_hmr + exec 367 ms ± 1.6 ms 369 ms ± 6.9 ms +0.64 %
arco-pro_production-mode + exec 3.18 s ± 114 ms 3.29 s ± 132 ms +3.42 %
arco-pro_production-mode_generate-package-json-webpack-plugin + exec 3.3 s ± 113 ms 3.38 s ± 280 ms +2.37 %
arco-pro_production-mode_persistent-cold + exec 3.26 s ± 108 ms 3.37 s ± 108 ms +3.24 %
arco-pro_production-mode_persistent-hot + exec 2.02 s ± 75 ms 2.07 s ± 261 ms +2.66 %
arco-pro_production-mode_traverse-chunk-modules + exec 3.25 s ± 67 ms 3.25 s ± 117 ms -0.08 %
large-dyn-imports_development-mode + exec 1.87 s ± 26 ms 1.86 s ± 36 ms -0.44 %
large-dyn-imports_production-mode + exec 1.88 s ± 11 ms 1.93 s ± 226 ms +2.54 %
threejs_development-mode_10x + exec 1.54 s ± 99 ms 1.48 s ± 16 ms -3.69 %
threejs_development-mode_10x_hmr + exec 970 ms ± 18 ms 937 ms ± 27 ms -3.42 %
threejs_production-mode_10x + exec 4.57 s ± 274 ms 4.61 s ± 263 ms +0.88 %
threejs_production-mode_10x_persistent-cold + exec 4.63 s ± 49 ms 4.73 s ± 315 ms +2.20 %
threejs_production-mode_10x_persistent-hot + exec 4.08 s ± 44 ms 4.15 s ± 13 ms +1.71 %
10000_big_production-mode_disable-minimize + rss memory 9944 MiB ± 416 MiB 9682 MiB ± 217 MiB -2.64 %
10000_development-mode + rss memory 691 MiB ± 19 MiB 695 MiB ± 26.3 MiB +0.52 %
10000_development-mode_hmr + rss memory 834 MiB ± 35.5 MiB 828 MiB ± 46.6 MiB -0.67 %
10000_development-mode_noop-loader + rss memory 986 MiB ± 20.9 MiB 975 MiB ± 11.9 MiB -1.06 %
10000_production-mode + rss memory 665 MiB ± 39.1 MiB 666 MiB ± 41.8 MiB +0.11 %
10000_production-mode_persistent-cold + rss memory 777 MiB ± 31.6 MiB 778 MiB ± 39.8 MiB +0.09 %
10000_production-mode_persistent-hot + rss memory 732 MiB ± 48.8 MiB 753 MiB ± 37.8 MiB +2.88 %
arco-pro_development-mode + rss memory 595 MiB ± 42.5 MiB 581 MiB ± 28.2 MiB -2.41 %
arco-pro_development-mode_hmr + rss memory 496 MiB ± 27 MiB 476 MiB ± 11.1 MiB -4.10 %
arco-pro_production-mode + rss memory 695 MiB ± 112 MiB 694 MiB ± 65.8 MiB -0.11 %
arco-pro_production-mode_generate-package-json-webpack-plugin + rss memory 750 MiB ± 41.2 MiB 719 MiB ± 110 MiB -4.09 %
arco-pro_production-mode_persistent-cold + rss memory 830 MiB ± 76.2 MiB 786 MiB ± 34.3 MiB -5.24 %
arco-pro_production-mode_persistent-hot + rss memory 653 MiB ± 53.1 MiB 646 MiB ± 85.1 MiB -1.13 %
arco-pro_production-mode_traverse-chunk-modules + rss memory 701 MiB ± 64.2 MiB 689 MiB ± 70.8 MiB -1.68 %
large-dyn-imports_development-mode + rss memory 712 MiB ± 6.51 MiB 704 MiB ± 1.26 MiB -1.16 %
large-dyn-imports_production-mode + rss memory 613 MiB ± 8.31 MiB 601 MiB ± 3.05 MiB -2.07 %
threejs_development-mode_10x + rss memory 621 MiB ± 15.6 MiB 602 MiB ± 13.8 MiB -3.01 %
threejs_development-mode_10x_hmr + rss memory 856 MiB ± 56.6 MiB 858 MiB ± 67.3 MiB +0.17 %
threejs_production-mode_10x + rss memory 822 MiB ± 213 MiB 850 MiB ± 179 MiB +3.49 %
threejs_production-mode_10x_persistent-cold + rss memory 841 MiB ± 25.6 MiB 825 MiB ± 24.5 MiB -1.88 %
threejs_production-mode_10x_persistent-hot + rss memory 737 MiB ± 33.7 MiB 696 MiB ± 36.1 MiB -5.62 %

@hardfist hardfist marked this pull request as ready for review September 11, 2025 06:05
@hardfist hardfist requested a review from quininer as a code owner September 11, 2025 06:05
@Copilot Copilot AI review requested due to automatic review settings September 11, 2025 06:05
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes automatic semicolon insertion (ASI) handling by switching from using a separate lexer to collect tokens to obtaining tokens directly from the parser during the parsing phase. This addresses compatibility issues that arose when migrating from an old context-aware lexer to a newer, faster lexer that has less context but struggles with ambiguous syntax outside of the parser.

  • Removes the separate lexer instantiation and token collection approach
  • Updates the parser to return both AST and tokens when requested
  • Switches token imports from swc_ecma_lexer to swc_core::ecma::parser::unstable

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/rspack_plugin_javascript/src/visitors/semicolon.rs Updates token imports to use parser's unstable module
crates/rspack_plugin_javascript/src/parser_and_generator/mod.rs Removes separate lexer, updates parser call to collect tokens
crates/rspack_plugin_javascript/Cargo.toml Removes swc_ecma_lexer dependency
crates/rspack_javascript_compiler/src/compiler/parse.rs Adds token collection capability to parser methods
crates/rspack_javascript_compiler/Cargo.toml Adds ecma_parser_unstable feature and swc_ecma_lexer dependency

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@hardfist hardfist merged commit da5778e into main Sep 11, 2025
42 of 44 checks passed
@hardfist hardfist deleted the yj/fix-token branch September 11, 2025 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release: bug fix release: bug related release(mr only) team The issue/pr is created by the member of Rspack.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants