fix(source/cockroachdb): detect existing LIMIT regardless of whitespace#3554
fix(source/cockroachdb): detect existing LIMIT regardless of whitespace#3554anxkhn wants to merge 1 commit into
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request refactors the CockroachDB source to use a case-insensitive regular expression with word boundaries for detecting existing LIMIT clauses in SQL queries, allowing it to handle multiline and tab-separated queries, and adds corresponding test cases. The review feedback correctly identifies a security risk where the LIMIT keyword check could be bypassed if it appears inside SQL comments or string literals. To address this, the reviewer suggests stripping comments and string literals before performing the check, and provides additional test cases to verify this robust behavior.
| // limitClauseRegexp matches a LIMIT keyword regardless of surrounding | ||
| // whitespace, so multiline or tab-formatted queries are detected too. | ||
| var limitClauseRegexp = regexp.MustCompile(`(?i)\bLIMIT\b`) |
There was a problem hiding this comment.
To prevent security bypasses where a client includes the word LIMIT inside a comment or a string literal, we should define a regex to match and strip SQL comments and string literals before checking for the LIMIT clause.
| // limitClauseRegexp matches a LIMIT keyword regardless of surrounding | |
| // whitespace, so multiline or tab-formatted queries are detected too. | |
| var limitClauseRegexp = regexp.MustCompile(`(?i)\bLIMIT\b`) | |
| // sqlTokenRegexp matches SQL string literals, double-quoted identifiers, | |
| // multi-line comments, and single-line comments. | |
| var sqlTokenRegexp = regexp.MustCompile(`'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|--.*`) | |
| // limitClauseRegexp matches a LIMIT keyword regardless of surrounding | |
| // whitespace, so multiline or tab-formatted queries are detected too. | |
| var limitClauseRegexp = regexp.MustCompile(`(?i)\bLIMIT\b`) |
| { | ||
| name: "SELECT with multiline existing LIMIT", | ||
| sql: "SELECT *\nFROM users\nLIMIT 50", | ||
| maxRowLimit: 100, | ||
| expectedSQL: "SELECT *\nFROM users\nLIMIT 50", | ||
| shouldAddLimit: false, | ||
| }, | ||
| { | ||
| name: "SELECT with lowercase multiline existing LIMIT", | ||
| sql: "SELECT * FROM users\nlimit 50", | ||
| maxRowLimit: 100, | ||
| expectedSQL: "SELECT * FROM users\nlimit 50", | ||
| shouldAddLimit: false, | ||
| }, | ||
| { | ||
| name: "SELECT with tab-separated existing LIMIT", | ||
| sql: "SELECT a\tLIMIT 5", | ||
| maxRowLimit: 100, | ||
| expectedSQL: "SELECT a\tLIMIT 5", | ||
| shouldAddLimit: false, | ||
| }, |
There was a problem hiding this comment.
Add test cases to verify that LIMIT detection is not bypassed by comments or string literals, and that trailing single-line comments do not comment out the appended LIMIT clause.
{
name: "SELECT with multiline existing LIMIT",
sql: "SELECT *\nFROM users\nLIMIT 50",
maxRowLimit: 100,
expectedSQL: "SELECT *\nFROM users\nLIMIT 50",
shouldAddLimit: false,
},
{
name: "SELECT with lowercase multiline existing LIMIT",
sql: "SELECT * FROM users\nlimit 50",
maxRowLimit: 100,
expectedSQL: "SELECT * FROM users\nlimit 50",
shouldAddLimit: false,
},
{
name: "SELECT with tab-separated existing LIMIT",
sql: "SELECT a\tLIMIT 5",
maxRowLimit: 100,
expectedSQL: "SELECT a\tLIMIT 5",
shouldAddLimit: false,
},
{
name: "SELECT with LIMIT in comment",
sql: "SELECT * FROM users -- LIMIT is not here",
maxRowLimit: 100,
expectedSQL: "SELECT * FROM users -- LIMIT is not here\nLIMIT 100",
shouldAddLimit: true,
},
{
name: "SELECT with LIMIT in string literal",
sql: "SELECT * FROM users WHERE name = 'LIMIT'",
maxRowLimit: 100,
expectedSQL: "SELECT * FROM users WHERE name = 'LIMIT' LIMIT 100",
shouldAddLimit: true,
},
{
name: "SELECT with trailing single-line comment",
sql: "SELECT * FROM users -- trailing comment",
maxRowLimit: 100,
expectedSQL: "SELECT * FROM users -- trailing comment\nLIMIT 100",
shouldAddLimit: true,
}d31e022 to
a952373
Compare
ApplyQueryLimits checked for an existing LIMIT clause with the space-anchored substring " LIMIT ", so a LIMIT separated by other whitespace (newline or tab) in a formatted multiline query was missed and a second LIMIT was appended, producing invalid SQL like "... LIMIT 50 LIMIT 100". Use a precompiled case-insensitive word-boundary regexp ((?i)\bLIMIT\b), matching the existing regexp.MustCompile style in this file, so the check is whitespace-agnostic. Add multiline, lowercase, and tab cases to TestApplyQueryLimits.
a952373 to
f3cd694
Compare
Description
Source.ApplyQueryLimitsin the CockroachDB source decides whether to append aLIMITto a SELECT by checkingstrings.Contains(strings.ToUpper(sql), " LIMIT "). That substring is anchored on single spaces, so any query that separatesLIMITwith other whitespace, for example a multiline or tab-formatted statement, is treated as having no limit and gets a second one appended:The trailing append branch already trims arbitrary whitespace, so the mismatch is purely in this detection. This change replaces the substring check with a precompiled word-boundary regexp
(?i)\bLIMIT\b, which matchesLIMITregardless of the surrounding whitespace or case. This mirrors the package's existingregexp.MustCompilestyle (seeRedactSQLin the same file). The append/trim logic is unchanged. Three "already has LIMIT" cases (multiline, lowercase multiline, tab) were added toTestApplyQueryLimits.PR Checklist
!if this involve a breaking change🛠️ Fixes #3553 🦕