Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing Fuzzer Fixes #7672

Merged
merged 43 commits into from
Mar 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
abbcd74
WIP malformed node in header
lukewilliamboswell Mar 7, 2025
d970230
fix memory leak, add tag to malformed node
lukewilliamboswell Mar 9, 2025
b103e0b
add error handling for header issues
lukewilliamboswell Mar 9, 2025
9d2c811
add parse errors to snapshots
lukewilliamboswell Mar 9, 2025
f6ae028
handle unexpected token in Pattern
lukewilliamboswell Mar 9, 2025
35cd1e8
fix fuzz crash ty_anno_unexpected_token, handle EOF more gracefully
lukewilliamboswell Mar 9, 2025
eeae3cd
fix crash in the middle of string parsing
lukewilliamboswell Mar 9, 2025
62d208d
fix crash in the middle of string parsing
lukewilliamboswell Mar 9, 2025
f6b94b7
fix crash for file with just a single uppercase character
lukewilliamboswell Mar 9, 2025
8c75277
fix crash and memory leak in tokenize reporting
lukewilliamboswell Mar 9, 2025
0ef4b16
dont add a newline if we have a malformed header
lukewilliamboswell Mar 9, 2025
3299a3e
fix out of bounds in parse.lineNum() and tokenize reporting
lukewilliamboswell Mar 9, 2025
383ad56
fix crash for malformed lambda
lukewilliamboswell Mar 9, 2025
138975a
fix fuzz crash for expr_no_space_dot_int
lukewilliamboswell Mar 9, 2025
6121f2a
only generate FORMATTED section if expected
lukewilliamboswell Mar 10, 2025
cc4a451
simplify snapshot logic for writing to file
lukewilliamboswell Mar 10, 2025
ea83e5b
add TOKENS section to snapshots
lukewilliamboswell Mar 10, 2025
0f17521
only add FORMAT section if it is not the same as source
lukewilliamboswell Mar 10, 2025
dbc2daf
format lambda expressions with no args using a space as a workaround …
lukewilliamboswell Mar 10, 2025
c74f0a5
unify Problems across stages for snapshot reporting
lukewilliamboswell Mar 10, 2025
9472052
minor cleanup
lukewilliamboswell Mar 10, 2025
5755303
minor cleanup
lukewilliamboswell Mar 10, 2025
0961098
snapshot and parser fixes
lukewilliamboswell Mar 10, 2025
dca8aff
Merge remote-tracking branch 'remote/main' into fuzz-fixes
lukewilliamboswell Mar 10, 2025
634dfb1
don't change the old builtins
lukewilliamboswell Mar 10, 2025
c8f7ac6
Merge remote-tracking branch 'remote/main' into fuzz-fixes
lukewilliamboswell Mar 10, 2025
7706299
use addMalformed for header_expected_open_bracket
lukewilliamboswell Mar 10, 2025
edee4ba
cleanup use of malformed
lukewilliamboswell Mar 10, 2025
22d0a13
malformed for expr_if_missing_else
lukewilliamboswell Mar 11, 2025
b5bbf77
malformed for expr_no_space_dot_int
lukewilliamboswell Mar 11, 2025
bb455e2
malformed for ty_anno_unexpected_token
lukewilliamboswell Mar 11, 2025
b43d9f0
remove file accidentally added
lukewilliamboswell Mar 11, 2025
f66261f
remove old debug assert
lukewilliamboswell Mar 11, 2025
18593b5
add more SExpr's for Pattern
lukewilliamboswell Mar 11, 2025
f09af53
fix tests, remove SExpr unit test in favour of snapshots
lukewilliamboswell Mar 11, 2025
365a9d5
move tokenize diagnostic to method
lukewilliamboswell Mar 11, 2025
39e1596
improve formatting for Problems in snapshots
lukewilliamboswell Mar 11, 2025
2fc3cd9
use double quotes for sexpr strings, fix caret position in tokenize p…
lukewilliamboswell Mar 11, 2025
000b9c2
fix broken sexpr test
lukewilliamboswell Mar 11, 2025
58da88d
simplify tokenize problem formatting, use helpers and add unit test t…
lukewilliamboswell Mar 11, 2025
680c434
use pushMalformed
lukewilliamboswell Mar 11, 2025
bbf5465
disable assertion causing crash
lukewilliamboswell Mar 11, 2025
fefd3a8
Merge remote-tracking branch 'remote/main' into fuzz-fixes
lukewilliamboswell Mar 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions crates/compiler/builtins/bitcode/build.zig
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,9 @@ fn generateObjectFile(

const suffix =
if (target.result.os.tag == std.Target.Os.Tag.windows)
"obj"
else
"o";
"obj"
else
"o";
Comment on lines +122 to +124
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is in the old compiler and probably wrong?

I would guess that zig 0.13.0 filters this differently than zig 0.14.0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it was zig 0.14.0 that made this change. But old-workflows have ran this over and been happy so I left it.

const install = b.addInstallFile(obj_file, b.fmt("{s}.{s}", .{ object_name, suffix }));

const obj_step = b.step(step_name, "Build object file for linking");
Expand Down
4 changes: 2 additions & 2 deletions src/base/sexpr.zig
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ pub const Expr = union(enum) {

try writer.print(")", .{});
},
.string => |s| try writer.print("'{s}'", .{s}),
.string => |s| try writer.print("\"{s}\"", .{s}),
.signed_int => |i| try writer.print("{d}", .{i}),
.unsigned_int => |u| try writer.print("{d}", .{u}),
.float => |f| try writer.print("{any}", .{f}),
Expand Down Expand Up @@ -191,7 +191,7 @@ test "s-expression" {
foo.toStringPretty(buf.writer().any());
const expected =
\\(foo
\\ 'bar'
\\ "bar"
\\ -123
\\ (baz 456 7.89e2))
;
Expand Down
111 changes: 6 additions & 105 deletions src/check/parse.zig
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,19 @@ pub fn parse(env: *base.ModuleEnv, source: []const u8) IR {
tokenizer.tokenize();
const result = tokenizer.finishAndDeinit();

if (result.messages.len > 0) {
tokenizeReport(env.gpa, source, result.messages);
for (result.messages) |msg| {
_ = env.problems.append(env.gpa, .{ .tokenize = msg });
}
Comment on lines +25 to 27
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify here, the tokenizer and parser are building a local list of errors and we are later adding them to the global list?

I would expect that the tokenzer and parser just add directly to the global list. Any specific reason for forcing the caller to move everything into the global list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that the Parser was built independently of the ModuleEnv design, and we haven't yet converged on a final form. I think we should we will just add to the module environment, which helps with reporting and various other things that cut across the whole compiler. But that change can happen in a future PR.


var parser = Parser.init(result.tokens);
defer parser.deinit();

parser.parseFile();

for (parser.diagnostics.items) |msg| {
_ = env.problems.append(env.gpa, .{ .parser = msg });
}

const errors = parser.diagnostics.toOwnedSlice(env.gpa) catch |err| exitOnOom(err);

return .{
Expand All @@ -40,106 +44,3 @@ pub fn parse(env: *base.ModuleEnv, source: []const u8) IR {
.errors = errors,
};
}

fn lineNum(newlines: std.ArrayList(usize), pos: u32) u32 {
const pos_usize = @as(usize, @intCast(pos));
var lineno: u32 = 0;
while (lineno < newlines.items.len) {
if (newlines.items[lineno + 1] > pos_usize) {
return lineno;
}
lineno += 1;
}
return lineno;
}

fn tokenizeReport(allocator: std.mem.Allocator, source: []const u8, msgs: []const tokenize.Diagnostic) void {
std.debug.print("Found the {d} following issues while tokenizing:\n", .{msgs.len});
var newlines = std.ArrayList(usize).init(allocator);
defer newlines.deinit();
newlines.append(0) catch |err| exitOnOom(err);
var pos: usize = 0;
for (source) |c| {
if (c == '\n') {
newlines.append(pos) catch |err| exitOnOom(err);
}
pos += 1;
}
for (msgs) |message| {
switch (message.tag) {
.MismatchedBrace => {
const start_line_num = lineNum(newlines, message.begin);
const start_col = message.begin - newlines.items[start_line_num];
const end_line_num = lineNum(newlines, message.end);
const end_col = message.end - newlines.items[end_line_num];

const src = source[newlines.items[start_line_num]..newlines.items[end_line_num + 1]];
var spaces = std.ArrayList(u8).init(allocator);
defer spaces.deinit();
for (0..start_col) |_| {
spaces.append(' ') catch |err| exitOnOom(err);
}

std.debug.print(
"({d}:{d}-{d}:{d}) Expected the correct closing brace here:\n{s}\n{s}^\n",
.{ start_line_num, start_col, end_line_num, end_col, src, spaces.toOwnedSlice() catch |err| exitOnOom(err) },
);
},
else => {
std.debug.print("MSG: {any}\n", .{message});
},
}
}
}

// TODO move this somewhere better, for now it's here to keep it simple.
fn testSExprHelper(source: []const u8, expected: []const u8) !void {
var env = base.ModuleEnv.init(testing.allocator);
defer env.deinit();

// parse our source
var parse_ast = parse(&env, source);
defer parse_ast.deinit();
std.testing.expectEqualSlices(IR.Diagnostic, &[_]IR.Diagnostic{}, parse_ast.errors) catch {
std.debug.print("Tokens:\n{any}", .{parse_ast.tokens.tokens.items(.tag)});
std.debug.panic("Test failed with parse errors", .{});
};

// shouldn't be required in future
parse_ast.store.emptyScratch();

// buffer to write our SExpr to
var buf = std.ArrayList(u8).init(testing.allocator);
defer buf.deinit();

// convert the AST to our SExpr
try parse_ast.toSExprStr(&env, buf.writer().any());

// TODO in future we should just write the SExpr to a file and snapshot it
// for now we are comparing strings to keep it simple
try testing.expectEqualStrings(expected, buf.items[0..]);
}

test "example s-expr" {
const source =
\\module [foo, bar]
\\
\\foo = "hey"
\\bar = "yo"
;

const expected =
\\(file
\\ (header
\\ (exposed_item (lower_ident 'foo'))
\\ (exposed_item (lower_ident 'bar')))
\\ (decl
\\ (ident 'foo')
\\ (string 'hey'))
\\ (decl
\\ (ident 'bar')
\\ (string 'yo')))
;

try testSExprHelper(source, expected);
}
Loading
Loading