Skip to content

fix(source/cockroachdb): detect existing LIMIT regardless of whitespace#3554

Open
anxkhn wants to merge 1 commit into
googleapis:mainfrom
anxkhn:loop/mcp-toolbox__002
Open

fix(source/cockroachdb): detect existing LIMIT regardless of whitespace#3554
anxkhn wants to merge 1 commit into
googleapis:mainfrom
anxkhn:loop/mcp-toolbox__002

Conversation

@anxkhn

@anxkhn anxkhn commented Jun 30, 2026

Copy link
Copy Markdown

Description

Source.ApplyQueryLimits in the CockroachDB source decides whether to append a LIMIT to a SELECT by checking strings.Contains(strings.ToUpper(sql), " LIMIT "). That substring is anchored on single spaces, so any query that separates LIMIT with other whitespace, for example a multiline or tab-formatted statement, is treated as having no limit and gets a second one appended:

SELECT *
FROM users
LIMIT 50

-> SELECT *\nFROM users\nLIMIT 50 LIMIT 100   (invalid SQL)

The trailing append branch already trims arbitrary whitespace, so the mismatch is purely in this detection. This change replaces the substring check with a precompiled word-boundary regexp (?i)\bLIMIT\b, which matches LIMIT regardless of the surrounding whitespace or case. This mirrors the package's existing regexp.MustCompile style (see RedactSQL in the same file). The append/trim logic is unchanged. Three "already has LIMIT" cases (multiline, lowercase multiline, tab) were added to TestApplyQueryLimits.

PR Checklist

  • Make sure you reviewed CONTRIBUTING.md
  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)
  • Make sure to add ! if this involve a breaking change

🛠️ Fixes #3553 🦕

@anxkhn anxkhn requested a review from a team as a code owner June 30, 2026 21:03
@google-cla

google-cla Bot commented Jun 30, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the CockroachDB source to use a case-insensitive regular expression with word boundaries for detecting existing LIMIT clauses in SQL queries, allowing it to handle multiline and tab-separated queries, and adds corresponding test cases. The review feedback correctly identifies a security risk where the LIMIT keyword check could be bypassed if it appears inside SQL comments or string literals. To address this, the reviewer suggests stripping comments and string literals before performing the check, and provides additional test cases to verify this robust behavior.

Comment on lines +302 to +304
// limitClauseRegexp matches a LIMIT keyword regardless of surrounding
// whitespace, so multiline or tab-formatted queries are detected too.
var limitClauseRegexp = regexp.MustCompile(`(?i)\bLIMIT\b`)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

To prevent security bypasses where a client includes the word LIMIT inside a comment or a string literal, we should define a regex to match and strip SQL comments and string literals before checking for the LIMIT clause.

Suggested change
// limitClauseRegexp matches a LIMIT keyword regardless of surrounding
// whitespace, so multiline or tab-formatted queries are detected too.
var limitClauseRegexp = regexp.MustCompile(`(?i)\bLIMIT\b`)
// sqlTokenRegexp matches SQL string literals, double-quoted identifiers,
// multi-line comments, and single-line comments.
var sqlTokenRegexp = regexp.MustCompile(`'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|--.*`)
// limitClauseRegexp matches a LIMIT keyword regardless of surrounding
// whitespace, so multiline or tab-formatted queries are detected too.
var limitClauseRegexp = regexp.MustCompile(`(?i)\bLIMIT\b`)

Comment on lines +221 to +241
{
name: "SELECT with multiline existing LIMIT",
sql: "SELECT *\nFROM users\nLIMIT 50",
maxRowLimit: 100,
expectedSQL: "SELECT *\nFROM users\nLIMIT 50",
shouldAddLimit: false,
},
{
name: "SELECT with lowercase multiline existing LIMIT",
sql: "SELECT * FROM users\nlimit 50",
maxRowLimit: 100,
expectedSQL: "SELECT * FROM users\nlimit 50",
shouldAddLimit: false,
},
{
name: "SELECT with tab-separated existing LIMIT",
sql: "SELECT a\tLIMIT 5",
maxRowLimit: 100,
expectedSQL: "SELECT a\tLIMIT 5",
shouldAddLimit: false,
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Add test cases to verify that LIMIT detection is not bypassed by comments or string literals, and that trailing single-line comments do not comment out the appended LIMIT clause.

		{
			name:           "SELECT with multiline existing LIMIT",
			sql:            "SELECT *\nFROM users\nLIMIT 50",
			maxRowLimit:    100,
			expectedSQL:    "SELECT *\nFROM users\nLIMIT 50",
			shouldAddLimit: false,
		},
		{
			name:           "SELECT with lowercase multiline existing LIMIT",
			sql:            "SELECT * FROM users\nlimit 50",
			maxRowLimit:    100,
			expectedSQL:    "SELECT * FROM users\nlimit 50",
			shouldAddLimit: false,
		},
		{
			name:           "SELECT with tab-separated existing LIMIT",
			sql:            "SELECT a\tLIMIT 5",
			maxRowLimit:    100,
			expectedSQL:    "SELECT a\tLIMIT 5",
			shouldAddLimit: false,
		},
		{
			name:           "SELECT with LIMIT in comment",
			sql:            "SELECT * FROM users -- LIMIT is not here",
			maxRowLimit:    100,
			expectedSQL:    "SELECT * FROM users -- LIMIT is not here\nLIMIT 100",
			shouldAddLimit: true,
		},
		{
			name:           "SELECT with LIMIT in string literal",
			sql:            "SELECT * FROM users WHERE name = 'LIMIT'",
			maxRowLimit:    100,
			expectedSQL:    "SELECT * FROM users WHERE name = 'LIMIT' LIMIT 100",
			shouldAddLimit: true,
		},
		{
			name:           "SELECT with trailing single-line comment",
			sql:            "SELECT * FROM users -- trailing comment",
			maxRowLimit:    100,
			expectedSQL:    "SELECT * FROM users -- trailing comment\nLIMIT 100",
			shouldAddLimit: true,
		}

@anxkhn anxkhn force-pushed the loop/mcp-toolbox__002 branch from d31e022 to a952373 Compare July 1, 2026 07:40
ApplyQueryLimits checked for an existing LIMIT clause with the
space-anchored substring " LIMIT ", so a LIMIT separated by other
whitespace (newline or tab) in a formatted multiline query was missed
and a second LIMIT was appended, producing invalid SQL like
"... LIMIT 50 LIMIT 100".

Use a precompiled case-insensitive word-boundary regexp ((?i)\bLIMIT\b),
matching the existing regexp.MustCompile style in this file, so the
check is whitespace-agnostic. Add multiline, lowercase, and tab cases
to TestApplyQueryLimits.
@anxkhn anxkhn force-pushed the loop/mcp-toolbox__002 branch from a952373 to f3cd694 Compare July 1, 2026 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CockroachDB ApplyQueryLimits appends a second LIMIT for multiline/tab-formatted queries

2 participants