Skip to content

Parser recursion limit#24810

Open
samuelcolvin wants to merge 6 commits intoastral-sh:mainfrom
samuelcolvin:parse-recursion-limit
Open

Parser recursion limit#24810
samuelcolvin wants to merge 6 commits intoastral-sh:mainfrom
samuelcolvin:parse-recursion-limit

Conversation

@samuelcolvin
Copy link
Copy Markdown

Summary

fix #22930.

Without this malicious or machine generated code could cause a stack overflow with something as simple as '(' * 5000 + '1' + ')' * 5000.

I decided to do the simplest thing and have a limit that's always applied with a reasonable default. Since:

  • the overhead of this check will be tiny
  • it seems inconceivable that anyone will want to have no limit

Test Plan

PR includes tests.

@samuelcolvin
Copy link
Copy Markdown
Author

samuelcolvin commented Apr 24, 2026

Hey, could someone please kick off CI for this.

Also, FWIW I have this working with monty and avoiding stack overflows both in AST parsing for the bytecode compiler and type checking in pydantic/monty#391.

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented Apr 24, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser MichaReiser added the parser Related to the parser label Apr 24, 2026
@MichaReiser
Copy link
Copy Markdown
Member

Thank you.

This is an improvement, but I'm not convinced it is the proper fix; it only moves the needle on for which programs the parser aborts. But it isn't sufficient, e.g., to protect against allocation failures because the program's too large.

I also checked, and neither TypeScript nor Rust implements the same treatment. Instead, the common approach across parsers is to:

  • Rewrite the recursion to a loop. This achieves the same degree of protection as what's proposed in this PR, but without arbitrarily truncating the AST.
  • Use a library to dynamically grow the stack by spilling to the heap, in places where rewriting to a loop isn't possible.

In the end, protecting against denial-of-service attacks isn't specific to stack overflows. The same protection must be in place to handle the exploitation of bugs (in the parser or elsewhere). Which is why I wouldn't consider this a security bug (it certainly adds a few more guardrails, but it doesn't prevent them).

@samuelcolvin
Copy link
Copy Markdown
Author

I get where you're coming from, but the fact is if you limit the code length, stack overflow is one of the only DOS risks in the parser.

@zanieb suggested you don't have the bandwidth to rewrite the recursion to a loop, and I certainly don't - so the choice is between adding this improvement, and not adding this improvement.

I'd therefore really appreciate it if you accepted this improvement. But I don't get it if you're willing to merge, I'll just use ruff crates from my branch and attempt to keep it up to date.

(If you are considering rewriting the parser to a loop, please consider making it available as an iterator so we can avoid the overhead of allocating before the first IR)

@MichaReiser
Copy link
Copy Markdown
Member

(If you are considering rewriting the parser to a loop, please consider making it available as an iterator so we can avoid the overhead of allocating before the first IR)

I think there was some misunderstanding of what "rewriting" to a loop means. I'm not suggesting that we rewrite the parser to a loop. Instead, the idea is to unroll the recursion by using a loop, similar to what we do in parse_binary_expression_or_higher_recursive. We should be able to rewrite them one by one by, starting with expression_lhs is probably the most important in terms of handling "real world code". However, we'd have to rewrite all of them to mitigate the DOS concerns (although there's no guarantee that the parser won't OOM when parsing a 4GB file that mainly consists of statements.)

@MichaReiser
Copy link
Copy Markdown
Member

I'm fine going ahead with this if we address the following issues:

  • @dhruvmanila mentioned that CPython has a similar limit for binary expressions. We should align our cut-off point with CPython's, or at least ensure it's not lower than CPython's.
  • Instead of using ..., we should use our normal error recovery node. For expressions, it's an empty identifier with the Invalid context.
  • We should safeguard against failing to restore the recursion depth. What I'd do is to change enter_recursion to return a RecursionScope struct that holds a DropBomb (a debug-only bomb seems fine?). The bomb needs to be defused by explicitly calling RecursionScope::exit(parser) (consumes self).
  • This PR does not fix Handle parser stack overflows more gracefully #22930. Instead, we should document that the recursion limit is temporary and the proper solution is to unroll the recursion by using a loop.

@samuelcolvin
Copy link
Copy Markdown
Author

great, I'll get those things fixed as soon as I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parser Related to the parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle parser stack overflows more gracefully

2 participants