Skip to content

Conversation

@adamziel
Copy link
Collaborator

@adamziel adamziel commented Oct 9, 2025

Proposes a WP_MySQL_Naive_Query_Stream to enable stream-processing large SQL files one query at a time without running out of memory.

Usage:

$stream = new WP_MySQL_Naive_Query_Stream();
$stream->append_sql( 'SELECT id FROM users; SELECT * FROM posts;' );
while ( $stream->next_query() ) {
    $sql_string = $stream->get_query();
    // Process the query.
 }

$stream->append_sql( 'CREATE TABLE users (id INT, name VARCHAR(255));' );
while ( $stream->next_query() ) {
     $sql_string = $stream->get_query();
     // Process the query.
}

$stream->mark_input_complete();
$stream->next_query(); // returns false

This class is naive because it doesn't understand what a valid query is.

We assume an invalid query if we can't get the next token and the input source is already exhausted or we have over 2MB of buffered SQL. We can't do better until the lexer provides an explicit distinction between syntax errors and incomplete input. I expect this heuristic to be sufficient in many scenarios, but it will of course fail in pathological cases such as SELECT SELECT SELECT ... without any semicolons.

Related to Automattic/wp-cli-sqlite-command#13

Remaining work

Review this PR, reformat code, add some more comments.

cc @JanJakes @sejas

@adamziel adamziel marked this pull request as draft October 9, 2025 15:21
@JanJakes
Copy link
Member

JanJakes commented Oct 9, 2025

@adamziel Thanks for sharing the draft and APIs!

This class is naive because it doesn't understand what a valid query is.
We can't do better until the lexer provides an explicit distinction between syntax errors and incomplete input.

Interesting! I was only thinking of implementing the full support in the lexer. I don't think it's very hard, but probably not very easy either.

That said, do you think it makes sense to get it in in the "naive" form? With the quick fixes I did today on the WP CLI SQLite side, covered with tests, it supports everything apart from NO_BACKSLASH_ESCAPES and DELIMITER .... We have no other use case yet, so it's likely not super urgent, but eventually necessary for sure.

@adamziel
Copy link
Collaborator Author

adamziel commented Oct 9, 2025

Yeah if there isn't a use-case, I think it's fine for this to sit here until one emerges. It likely will as a part of the streaming importer work, and we may need to explore a non-naive query stream implementation for that.

@JanJakes
Copy link
Member

@adamziel If we run into another issue with the current WP SQLite CLI parsing, I'll definitely use this in some form. Also, when moving the CLI commands to the SQLite repo, it could make sense to give this a try.

adamziel added a commit to WordPress/wordpress-playground that referenced this pull request Nov 21, 2025
Adapts
[WP_MySQL_Naive_Query_Stream](WordPress/sqlite-database-integration#264)
to support multiline SQL queries in the `runSql` step. With this PR, the
following call works:

```ts
await runSql(php, {
	sql: new File(
		[
			`SELECT * FROM 
				wp_users
				-- users table
			;
			 SELECT * FROM wp_posts;`
		]
		'no-trailing-newline.sql'
	),
});
```

Whereas before this PR, the `runSql` step assumed every line of a SQL
file is a separate query and would fail on the above call.

## Implementation details

See WordPress/sqlite-database-integration#264.
Tl;dr we tokenize the query and treat `;` and EOF tokens as query
separators. The stream is only "naive" in that every query must be
smaller than 15MB. It might fail for some very large WordPress posts,
but should work most of the time. Once the lexer provides an explicit
distinction between syntax errors and incomplete input, we'll be able to
support arbitrarily large queries.

## Testing Instructions (or ideally a Blueprint)

Tests have been updated to verify multiline query handling, SQL comment
preservation, and queries with subqueries. The streaming parser
correctly handles edge cases like empty lines, semicolon-only lines, and
queries split across chunk boundaries.

cc @JanJakes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants