Parsing Fuzzer Fixes #7672

lukewilliamboswell · 2025-03-07T07:11:27Z

This PR

adds tokenise and parser diagnostics to ModuleEnv as Problems
fix out of bound issue in parse.zig from line numbers
fix memory leak by free'ing parser errors in deinit
adds eleven (11) new Diagnostic errors for the Parser
removes panics and handle malformed parsed AST nodes
implements a significant number of SExpr nodes for the Parser AST
improves reliability for handling newlines in Parser
upgrades the snapshot tool with new PROBLEMS, TOKENS, and FORMATTED sections
adds eleven (11) new snapshots, discovered from fuzzing the parser

src/coordinate.zig

lukewilliamboswell · 2025-03-09T05:41:46Z

src/check/parse/IR.zig

@@ -23,6 +23,7 @@ errors: []const Diagnostic,
 pub fn deinit(self: *IR) void {
    defer self.tokens.deinit();
    defer self.store.deinit();
+    self.store.gpa.free(self.errors);


This fixed our memory leak... not sure it's 100% correct

Probably correct for now, but might be wrong in the long term. Long term, we may want errors to outlive the rest of parse. So just like tokenize has finishAndDeinit, we may need something similar that returns the errors for later use but frees everything else. Not a now problem, but just context.

Yeah, now you mention it... the errors should be pushed into the ModuleEnv as problems, not in the IR.

Yeah, I'd rather not do this in deinit, errors should be owned by the owner of the parse result

Can I leave this for you @gamebox to change in a follow up? I've left it here as it keeps the fuzzer happy.

I thought the PR moved this into ModuleEnv?

Also, the more proper short term fix would be to pull this out of deinit and instead add it next to all current callsites of parse_ast.deint(). That gives the caller control of when to free the error list.

I'd like to leave this for a follow up, where we focus on unifying the approach to handling Problems across the compiler.

src/snapshots/003.txt

src/check/parse/Parser.zig

src/snapshots/header_unexpected_token.txt

…for ambiguity

src/check/parse.zig

gamebox · 2025-03-10T22:23:08Z

src/check/parse/IR.zig

@@ -23,6 +23,7 @@ errors: []const Diagnostic,
 pub fn deinit(self: *IR) void {
    defer self.tokens.deinit();
    defer self.store.deinit();
+    self.store.gpa.free(self.errors);


Yeah, I'd rather not do this in deinit, errors should be owned by the owner of the parse result

src/check/parse/IR.zig

gamebox · 2025-03-10T22:25:27Z

src/check/parse/IR.zig

@@ -596,6 +608,11 @@ pub const NodeStore = struct {
                node.data.lhs = mod.exposes.span.start;
                node.data.rhs = mod.exposes.span.len;
            },
+            .malformed => |a| {


Thank you! I just said this morning that I need to do exactly this. Looks great for the most part.

Having region will help with this quite a bit, will be adding that to node soon

Yes, region will mean we can provide much nicer formatted error messages

gamebox · 2025-03-10T22:26:50Z

src/check/parse/Parser.zig

@@ -117,7 +117,9 @@ pub fn pushDiagnostic(self: *Parser, tag: IR.Diagnostic.Tag, region: IR.Region)
 /// add a malformed token
 pub fn pushMalformed(self: *Parser, comptime t: type, tag: IR.Diagnostic.Tag) t {
    const pos = self.pos;
-    self.advanceOne(); // TODO: find a better point to advance to
+    if (self.peek() != .EndOfFile) {


I want pushMalformed to include a advance_to Token.Tag so that it takes care of this (it will also need a start position in order to create a real Region)

Sounds good. 👍

src/check/parse/Parser.zig

lukewilliamboswell · 2025-03-10T23:02:59Z

(removed)

lukewilliamboswell · 2025-03-11T03:31:51Z

src/fmt.zig

+            // TODO -- this is a hack to avoid ambiguity with no arguments,
+            // if we parse it again without the space it will be parsed as
+            // a logical OR `||` instead
+            //
+            // desired behaviour described here https://roc.zulipchat.com/#narrow/channel/395097-compiler-development/topic/zig.20compiler.20-.20spike/near/504453049


Leaving this for someone to do in a follow up PR, for now we're keeping the fuzzer happy

Yeah sounds like the correct fix is minor and in the tokenizer. Just remove || and && parsing. Then it is not ambiguous.

We've merged the changes into the tokeniser. My preference is to revert this in a follow up PR and not add more to this.

…roblem formatting

…o correct caret positioning

bhansconnect

Gave a quick review. Leaving final review to @gamebox or @joshuawarner32

bhansconnect · 2025-03-11T05:06:45Z

crates/compiler/builtins/bitcode/build.zig

+            "obj"
+        else
+            "o";


This change is in the old compiler and probably wrong?

I would guess that zig 0.13.0 filters this differently than zig 0.14.0

Yeah it was zig 0.14.0 that made this change. But old-workflows have ran this over and been happy so I left it.

bhansconnect · 2025-03-11T05:07:43Z

src/check/parse/IR.zig

@@ -23,6 +23,7 @@ errors: []const Diagnostic,
 pub fn deinit(self: *IR) void {
    defer self.tokens.deinit();
    defer self.store.deinit();
+    self.store.gpa.free(self.errors);


I thought the PR moved this into ModuleEnv?

bhansconnect · 2025-03-11T05:08:45Z

src/check/parse/IR.zig

@@ -23,6 +23,7 @@ errors: []const Diagnostic,
 pub fn deinit(self: *IR) void {
    defer self.tokens.deinit();
    defer self.store.deinit();
+    self.store.gpa.free(self.errors);


Also, the more proper short term fix would be to pull this out of deinit and instead add it next to all current callsites of parse_ast.deint(). That gives the caller control of when to free the error list.

src/check/parse/Parser.zig

bhansconnect · 2025-03-11T05:11:45Z

src/fmt.zig

+            // TODO -- this is a hack to avoid ambiguity with no arguments,
+            // if we parse it again without the space it will be parsed as
+            // a logical OR `||` instead
+            //
+            // desired behaviour described here https://roc.zulipchat.com/#narrow/channel/395097-compiler-development/topic/zig.20compiler.20-.20spike/near/504453049


Yeah sounds like the correct fix is minor and in the tokenizer. Just remove || and && parsing. Then it is not ambiguous.

gamebox · 2025-03-11T10:56:02Z

src/snapshots/fuzz_crash_002.txt

+    (malformed_expr "unexpected_token")
+    (malformed_expr "unexpected_token")
+    (malformed_expr "unexpected_token")
+    (ident "" "le")


I would expect here something like:

(statement (expr (ident "" "le")))

bhansconnect

I definitely think there are things to deal with in follow up, but I think it is fine to merge as is.

bhansconnect · 2025-03-11T22:26:43Z

src/check/parse.zig

+    for (result.messages) |msg| {
+        _ = env.problems.append(env.gpa, .{ .tokenize = msg });
    }


Just to clarify here, the tokenizer and parser are building a local list of errors and we are later adding them to the global list?

I would expect that the tokenzer and parser just add directly to the global list. Any specific reason for forcing the caller to move everything into the global list.

It's just that the Parser was built independently of the ModuleEnv design, and we haven't yet converged on a final form. I think we should we will just add to the module environment, which helps with reporting and various other things that cut across the whole compiler. But that change can happen in a future PR.

bhansconnect · 2025-03-11T22:29:16Z

src/check/parse/IR.zig

+                // disabled because it was hit by a fuzz test
+                // for a repro see src/snapshots/fuzz_crash_012.txt
+                // std.debug.assert(a.patterns.span.len > 1);


Not for this PR, but we should think about ways to leave in asserts while still documenting fuzzer failures.

Cause I don't think the correct answer to a fuzzer failure on an assert is to disable the assert. Like I don't want to make this a general practice. I think it is a bad precedence.

bhansconnect · 2025-03-11T22:31:36Z

src/check/parse/Parser.zig

            } else {
-                // If not a decl
-                const expr = self.parseExpr();
-                const statement_idx = self.store.addStatement(.{ .expr = .{
-                    .expr = expr,
-                    .region = .{ .start = start, .end = start },
-                } });
-                if (self.peek() == .Newline) {
-                    self.advance();
-                }
-                return statement_idx;
+                // continue to parse final expression
            }


can we remove the empty else? Maybe put a comment before the if for clarity if it is needed?

Alternatively, make parsing the final expression a function and return parseFinalExpr(...) in each else here.

I thought including the else and a comment was more explicit that it was falling through to the last part.

Happy to look at this in a follow up PR. I was removing some duplicate code here, but it's might be better to revert my change and leave it as it was.

To be fair, might have just looked odd due to github pr review diff. I would lead to pulling it out into a function.

Ah yeah, looking at the raw source, this looks good. I think the diff view just made it look really spread out and unrelated.

That said, I think I would still prefer the final expression be a separate function. So it would be else { return parseFinalExpr(...); } same with the else => {...} branch of the switch. Then the switch will truly handle all cases with no fallthrough. Makes sure that someone doesn't accidentally fall through to the final expression parsing when they didn't mean to.

bhansconnect · 2025-03-11T22:31:47Z

src/check/parse/Parser.zig

+            } else {
+                // continue to parse final expression
            }


bhansconnect reviewed Mar 8, 2025

View reviewed changes

src/coordinate.zig Outdated Show resolved Hide resolved

lukewilliamboswell added 2 commits March 9, 2025 15:44

WIP malformed node in header

abbcd74

fix memory leak, add tag to malformed node

d970230

lukewilliamboswell force-pushed the fuzz-fixes branch from c13fdb3 to d970230 Compare March 9, 2025 05:41

lukewilliamboswell commented Mar 9, 2025

View reviewed changes

src/snapshots/003.txt Outdated Show resolved Hide resolved

lukewilliamboswell added 2 commits March 9, 2025 20:27

add error handling for header issues

b103e0b

add parse errors to snapshots

9d2c811

lukewilliamboswell commented Mar 9, 2025

View reviewed changes

src/check/parse/Parser.zig Outdated Show resolved Hide resolved

lukewilliamboswell commented Mar 9, 2025

View reviewed changes

src/snapshots/header_unexpected_token.txt Outdated Show resolved Hide resolved

lukewilliamboswell added 2 commits March 10, 2025 08:48

handle unexpected token in Pattern

f6ae028

fix fuzz crash ty_anno_unexpected_token, handle EOF more gracefully

35cd1e8

lukewilliamboswell changed the title ~~Fix malformed node in header~~ Parsing Fuzzer Fixes Mar 9, 2025

lukewilliamboswell added 17 commits March 10, 2025 09:30

fix crash in the middle of string parsing

eeae3cd

fix crash in the middle of string parsing

62d208d

fix crash for file with just a single uppercase character

f6b94b7

fix crash and memory leak in tokenize reporting

8c75277

dont add a newline if we have a malformed header

0ef4b16

fix out of bounds in parse.lineNum() and tokenize reporting

3299a3e

fix crash for malformed lambda

383ad56

fix fuzz crash for expr_no_space_dot_int

138975a

only generate FORMATTED section if expected

6121f2a

simplify snapshot logic for writing to file

cc4a451

add TOKENS section to snapshots

ea83e5b

only add FORMAT section if it is not the same as source

0f17521

format lambda expressions with no args using a space as a workaround …

dbc2daf

…for ambiguity

unify Problems across stages for snapshot reporting

c74f0a5

minor cleanup

9472052

minor cleanup

5755303

snapshot and parser fixes

0961098

lukewilliamboswell added 2 commits March 10, 2025 17:01

Merge remote-tracking branch 'remote/main' into fuzz-fixes

dca8aff

don't change the old builtins

634dfb1

gamebox reviewed Mar 10, 2025

View reviewed changes

Merge remote-tracking branch 'remote/main' into fuzz-fixes

c8f7ac6

lukewilliamboswell changed the title ~~Parsing Fuzzer Fixes~~ [WIP] Parsing Fuzzer Fixes Mar 10, 2025

lukewilliamboswell added 11 commits March 11, 2025 10:39

use addMalformed for header_expected_open_bracket

7706299

cleanup use of malformed

edee4ba

malformed for expr_if_missing_else

22d0a13

malformed for expr_no_space_dot_int

b5bbf77

malformed for ty_anno_unexpected_token

bb455e2

remove file accidentally added

b43d9f0

remove old debug assert

f66261f

add more SExpr's for Pattern

18593b5

fix tests, remove SExpr unit test in favour of snapshots

f09af53

move tokenize diagnostic to method

365a9d5

improve formatting for Problems in snapshots

39e1596

lukewilliamboswell commented Mar 11, 2025

View reviewed changes

lukewilliamboswell changed the title ~~[WIP] Parsing Fuzzer Fixes~~ Parsing Fuzzer Fixes Mar 11, 2025

lukewilliamboswell added 3 commits March 11, 2025 14:42

use double quotes for sexpr strings, fix caret position in tokenize p…

2fc3cd9

…roblem formatting

fix broken sexpr test

000b9c2

simplify tokenize problem formatting, use helpers and add unit test t…

58da88d

…o correct caret positioning

bhansconnect reviewed Mar 11, 2025

View reviewed changes

lukewilliamboswell added 2 commits March 11, 2025 21:01

use pushMalformed

680c434

disable assertion causing crash

bbf5465

gamebox reviewed Mar 11, 2025

View reviewed changes

Merge remote-tracking branch 'remote/main' into fuzz-fixes

fefd3a8

bhansconnect approved these changes Mar 11, 2025

View reviewed changes

lukewilliamboswell merged commit 13c5152 into main Mar 11, 2025
30 of 32 checks passed

lukewilliamboswell deleted the fuzz-fixes branch March 11, 2025 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing Fuzzer Fixes #7672

Parsing Fuzzer Fixes #7672

lukewilliamboswell commented Mar 7, 2025 •

edited

Loading

lukewilliamboswell Mar 9, 2025

bhansconnect Mar 9, 2025

lukewilliamboswell Mar 9, 2025

gamebox Mar 10, 2025

lukewilliamboswell Mar 11, 2025

bhansconnect Mar 11, 2025

bhansconnect Mar 11, 2025

lukewilliamboswell Mar 11, 2025

gamebox Mar 10, 2025

gamebox Mar 10, 2025

lukewilliamboswell Mar 11, 2025

gamebox Mar 10, 2025

lukewilliamboswell Mar 11, 2025

lukewilliamboswell commented Mar 10, 2025 •

edited

Loading

lukewilliamboswell Mar 11, 2025

bhansconnect Mar 11, 2025

lukewilliamboswell Mar 11, 2025

bhansconnect left a comment

bhansconnect Mar 11, 2025

lukewilliamboswell Mar 11, 2025

bhansconnect Mar 11, 2025

bhansconnect Mar 11, 2025

bhansconnect Mar 11, 2025

gamebox Mar 11, 2025

bhansconnect left a comment

bhansconnect Mar 11, 2025

lukewilliamboswell Mar 11, 2025

bhansconnect Mar 11, 2025

bhansconnect Mar 11, 2025

bhansconnect Mar 11, 2025

lukewilliamboswell Mar 11, 2025

bhansconnect Mar 11, 2025

bhansconnect Mar 11, 2025

Parsing Fuzzer Fixes #7672

Parsing Fuzzer Fixes #7672

Conversation

lukewilliamboswell commented Mar 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukewilliamboswell commented Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhansconnect left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhansconnect left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukewilliamboswell commented Mar 7, 2025 •

edited

Loading

lukewilliamboswell commented Mar 10, 2025 •

edited

Loading