Skip to content

Conversation

06wj
Copy link
Contributor

@06wj 06wj commented Aug 20, 2025

No description provided.

@AndrewRayCode
Copy link
Collaborator

AndrewRayCode commented Aug 21, 2025

Ah yikes, I didn't know this was a valid preprocessor case.

Thank you for the fix and the tests. I think a different strategy is needed here unfortunately. The grammar might need to instead be updated to support parsing the right hand side of a #define as a full expression.

This library will likely be run on raw user input (it is in the case of Shaderfrog), so the eval() in here would open up any consumer of the parser to vulnerabilities. I also think ideally that the identifier visitor shouldn't need to do the heavy lifting of evaluating an expression. The binary visitor handles expression evaluation, it would be nice if somehow a define expression could go through a similar path.

Or maybe a different strategy is needed entirely, like parse -> expand all macros -> reparse the new lexeme, to handle expanded preproecssor expressions, that are then properly picked up by the peg grammar.

"Preprocessing Directives" begins on page 157 of this nightmare https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf which I need to read through more closely. I see there are other behaviors I missed like #if is only true if the expression evaluates to non-zero, and defined(x) evaluates to 1/0. I also reference https://godbolt.org/ for testing against a reference preprocessor

@AndrewRayCode
Copy link
Collaborator

AndrewRayCode commented Aug 22, 2025

Ah yeah, in the grammar, the body of a define is not only a string. The peg grammar currently treats the body as simply a string. The doc linked above has (this is condensed from a few different pages):

# define identifier replacement-list new-line

replacement-list:
    pp-tokens

pp-tokens:
    preprocessing-token
    pp-tokens preprocessing-token

preprocessing-token:
    header-name
    identifier
    pp-number
    character-constant
    string-literal
    punctuator
    each non-white-space character that cannot be one of the above

I included your commit in #58 (wip) and removed the eval so you're still in the commit history. I think this new PR is a better approach - treat the body of a define as token sequence, not a single string. I wasn't sure if I could push to this PR so I went with opening a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants