-
-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing Fuzzer Fixes #7672
Parsing Fuzzer Fixes #7672
Conversation
c13fdb3
to
d970230
Compare
@@ -23,6 +23,7 @@ errors: []const Diagnostic, | |||
pub fn deinit(self: *IR) void { | |||
defer self.tokens.deinit(); | |||
defer self.store.deinit(); | |||
self.store.gpa.free(self.errors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixed our memory leak... not sure it's 100% correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably correct for now, but might be wrong in the long term. Long term, we may want errors to outlive the rest of parse. So just like tokenize has finishAndDeinit, we may need something similar that returns the errors for later use but frees everything else. Not a now problem, but just context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, now you mention it... the errors should be pushed into the ModuleEnv as problems, not in the IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd rather not do this in deinit, errors should be owned by the owner of the parse result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I leave this for you @gamebox to change in a follow up? I've left it here as it keeps the fuzzer happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the PR moved this into ModuleEnv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the more proper short term fix would be to pull this out of deinit and instead add it next to all current callsites of parse_ast.deint()
. That gives the caller control of when to free the error list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to leave this for a follow up, where we focus on unifying the approach to handling Problems across the compiler.
@@ -23,6 +23,7 @@ errors: []const Diagnostic, | |||
pub fn deinit(self: *IR) void { | |||
defer self.tokens.deinit(); | |||
defer self.store.deinit(); | |||
self.store.gpa.free(self.errors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd rather not do this in deinit, errors should be owned by the owner of the parse result
src/check/parse/IR.zig
Outdated
@@ -596,6 +608,11 @@ pub const NodeStore = struct { | |||
node.data.lhs = mod.exposes.span.start; | |||
node.data.rhs = mod.exposes.span.len; | |||
}, | |||
.malformed => |a| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I just said this morning that I need to do exactly this. Looks great for the most part.
Having region will help with this quite a bit, will be adding that to node soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, region will mean we can provide much nicer formatted error messages
@@ -117,7 +117,9 @@ pub fn pushDiagnostic(self: *Parser, tag: IR.Diagnostic.Tag, region: IR.Region) | |||
/// add a malformed token | |||
pub fn pushMalformed(self: *Parser, comptime t: type, tag: IR.Diagnostic.Tag) t { | |||
const pos = self.pos; | |||
self.advanceOne(); // TODO: find a better point to advance to | |||
if (self.peek() != .EndOfFile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want pushMalformed to include a advance_to Token.Tag so that it takes care of this (it will also need a start position in order to create a real Region)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. 👍
(removed) |
// TODO -- this is a hack to avoid ambiguity with no arguments, | ||
// if we parse it again without the space it will be parsed as | ||
// a logical OR `||` instead | ||
// | ||
// desired behaviour described here https://roc.zulipchat.com/#narrow/channel/395097-compiler-development/topic/zig.20compiler.20-.20spike/near/504453049 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this for someone to do in a follow up PR, for now we're keeping the fuzzer happy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds like the correct fix is minor and in the tokenizer. Just remove ||
and &&
parsing. Then it is not ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've merged the changes into the tokeniser. My preference is to revert this in a follow up PR and not add more to this.
…roblem formatting
…o correct caret positioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave a quick review. Leaving final review to @gamebox or @joshuawarner32
"obj" | ||
else | ||
"o"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is in the old compiler and probably wrong?
I would guess that zig 0.13.0 filters this differently than zig 0.14.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it was zig 0.14.0 that made this change. But old-workflows have ran this over and been happy so I left it.
@@ -23,6 +23,7 @@ errors: []const Diagnostic, | |||
pub fn deinit(self: *IR) void { | |||
defer self.tokens.deinit(); | |||
defer self.store.deinit(); | |||
self.store.gpa.free(self.errors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the PR moved this into ModuleEnv?
@@ -23,6 +23,7 @@ errors: []const Diagnostic, | |||
pub fn deinit(self: *IR) void { | |||
defer self.tokens.deinit(); | |||
defer self.store.deinit(); | |||
self.store.gpa.free(self.errors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the more proper short term fix would be to pull this out of deinit and instead add it next to all current callsites of parse_ast.deint()
. That gives the caller control of when to free the error list.
// TODO -- this is a hack to avoid ambiguity with no arguments, | ||
// if we parse it again without the space it will be parsed as | ||
// a logical OR `||` instead | ||
// | ||
// desired behaviour described here https://roc.zulipchat.com/#narrow/channel/395097-compiler-development/topic/zig.20compiler.20-.20spike/near/504453049 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds like the correct fix is minor and in the tokenizer. Just remove ||
and &&
parsing. Then it is not ambiguous.
(malformed_expr "unexpected_token") | ||
(malformed_expr "unexpected_token") | ||
(malformed_expr "unexpected_token") | ||
(ident "" "le") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect here something like:
(statement
(expr
(ident "" "le")))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think there are things to deal with in follow up, but I think it is fine to merge as is.
for (result.messages) |msg| { | ||
_ = env.problems.append(env.gpa, .{ .tokenize = msg }); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify here, the tokenizer and parser are building a local list of errors and we are later adding them to the global list?
I would expect that the tokenzer and parser just add directly to the global list. Any specific reason for forcing the caller to move everything into the global list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that the Parser was built independently of the ModuleEnv design, and we haven't yet converged on a final form. I think we should we will just add to the module environment, which helps with reporting and various other things that cut across the whole compiler. But that change can happen in a future PR.
// disabled because it was hit by a fuzz test | ||
// for a repro see src/snapshots/fuzz_crash_012.txt | ||
// std.debug.assert(a.patterns.span.len > 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but we should think about ways to leave in asserts while still documenting fuzzer failures.
Cause I don't think the correct answer to a fuzzer failure on an assert is to disable the assert. Like I don't want to make this a general practice. I think it is a bad precedence.
} else { | ||
// If not a decl | ||
const expr = self.parseExpr(); | ||
const statement_idx = self.store.addStatement(.{ .expr = .{ | ||
.expr = expr, | ||
.region = .{ .start = start, .end = start }, | ||
} }); | ||
if (self.peek() == .Newline) { | ||
self.advance(); | ||
} | ||
return statement_idx; | ||
// continue to parse final expression | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove the empty else? Maybe put a comment before the if for clarity if it is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, make parsing the final expression a function and return parseFinalExpr(...)
in each else here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought including the else and a comment was more explicit that it was falling through to the last part.
Happy to look at this in a follow up PR. I was removing some duplicate code here, but it's might be better to revert my change and leave it as it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, might have just looked odd due to github pr review diff. I would lead to pulling it out into a function.
Ah yeah, looking at the raw source, this looks good. I think the diff view just made it look really spread out and unrelated.
That said, I think I would still prefer the final expression be a separate function. So it would be else { return parseFinalExpr(...); }
same with the else => {...}
branch of the switch. Then the switch will truly handle all cases with no fallthrough. Makes sure that someone doesn't accidentally fall through to the final expression parsing when they didn't mean to.
} else { | ||
// continue to parse final expression | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
This PR
parse.zig
from line numbersPROBLEMS
,TOKENS
, andFORMATTED
sections