Skip to content

Commit 73693a9

Browse files
ZetaSQL Teamahirschberg-corp-oss
authored andcommitted
Export of internal ZetaSQL changes.
-- Change by ZetaSQL Team <no-reply@google.com>: Move analyzer DP basic tests with set operations into their own dedicated file. -- Change by ZetaSQL Team <no-reply@google.com>: Move analyzer DP tests with set operations related to the privacy user ID column to their own file. -- Change by ZetaSQL Team <no-reply@google.com>: Update one of the comment symbols in AS-pipe-operator.md -- Change by ZetaSQL Team <no-reply@google.com>: Remove incorrect note about pipe operators from the 'Pipe syntax' section -- Change by ZetaSQL Team <no-reply@google.com>: Move DP tests for userid consistency with set operations in their own file. -- Change by ZetaSQL Team <no-reply@google.com>: Increase timeouts for long-running tests -- Change by ZetaSQL Team <no-reply@google.com>: Prevent measure propagation through OUTER JOINs in ArrayScans. -- Change by ZetaSQL Team <no-reply@google.com>: Ensure correct error message when signature does not match for REGEXP_EXTRACT_GROUPS -- Change by ZetaSQL Team <no-reply@google.com>: Prevent measure propagation through OUTER JOINs in ArrayScans. -- Change by ZetaSQL Team <no-reply@google.com>: Add documentation for CREATE PROCEDURE statement. -- Change by ZetaSQL Team <no-reply@google.com>: No public description -- Change by ZetaSQL Team <no-reply@google.com>: Update reference implementation to support LIMIT ALL/LIMIT <null> -- Change by ZetaSQL Team <no-reply@google.com>: Add differential privacy report and tablesample analyzer test files to ZetaSQL allowlist. -- Change by ZetaSQL Team <no-reply@google.com>: Move default language features to top of differential privacy analyzer test file. -- Change by ZetaSQL Team <no-reply@google.com>: Update ResolvedAST docs -- Change by ZetaSQL Team <no-reply@google.com>: Move Analyzer tests for differential privacy queries using ARRAY into a dedicated file. -- Change by ZetaSQL Team <no-reply@google.com>: Split OPTIONS clause-focused Analyzer DIFFERENTIAL_PRIVACY tests into their own file. -- Change by ZetaSQL Team <no-reply@google.com>: Move Analyzer tests for differential privacy with FROM clause subqueries into dedicated file. -- Change by Jeff Shute <jshute@google.com>: Add resolved AST markers for ResolvedColumn creation vs references. -- Change by Christoph Dibak <dibak@google.com>: Add compliance test for nested queries -- Change by Christoph Dibak <dibak@google.com>: Allow dp queries as public group joins -- Change by ZetaSQL Team <no-reply@google.com>: Documentation for REGEXP_EXTRACT_GROUPS -- Change by Brandon Dolphin <bdolphin@google.com>: Set moderate timeout for .../analyzer:analyzer_test. -- Change by ZetaSQL Team <no-reply@google.com>: Refactor quantified path algebrizer test and minor fix its algebrizer -- Change by ZetaSQL Team <no-reply@google.com>: Allow custom array size limit in GenerateArrayHelper -- Change by ZetaSQL Team <no-reply@google.com>: Move Analyzer tests for differential privacy report format into dedicated file. -- Change by ZetaSQL Team <no-reply@google.com>: Analyzer support for `UPDATE ... SET` statements for the JSON subscript operator. -- Change by ZetaSQL Team <no-reply@google.com>: Clarify comment in 'AS pipe operator' example -- Change by ZetaSQL Team <no-reply@google.com>: Update the script generating the AST docs to also generate the parser AST docs. -- Change by ZetaSQL Team <no-reply@google.com>: mark FEATURE_MULTI_GROUPING_SETS not in development. -- Change by ZetaSQL Team <no-reply@google.com>: Move analyzer tests for TABLESAMPLE support with DP queries into their own dedicated file. -- Change by ZetaSQL Team <no-reply@google.com>: No public description -- Change by ZetaSQL Team <no-reply@google.com>: Remove obsolete SELECT WITH DIFFERENTIAL_PRIVACY test. -- Change by ZetaSQL Team <no-reply@google.com>: Documentation clarification about `ARRAY_AGG`. -- Change by ZetaSQL Team <no-reply@google.com>: No public description -- Change by ZetaSQL Team <no-reply@google.com>: update behaviors when FEATURE_GROUPING_SETS is not enabled while FEATURE_MULTI_GROUPING_SETS is enabled. -- Change by Jeff Shute <jshute@google.com>: Run pyformat on all of gen_resolved_ast.py. -- Change by Divyanshu Ranjan <divyanshur@google.com>: Add test where relational argument is passed to BuiltinTableValuedFunction::CreateCall -- Change by ZetaSQL Team <no-reply@google.com>: Support required language features in TableValuedFunctionOptions. -- Change by ZetaSQL Team <no-reply@google.com>: Refactor & Simplify the rule for braced_ctor_extension_expr to remove the %prec directive. -- Change by ZetaSQL Team <no-reply@google.com>: Disallow TVFs from having both graph and scalar-only signatures. -- Change by ZetaSQL Team <no-reply@google.com>: Support pseudo-columns in value tables passed into TVFs -- Change by ZetaSQL Team <no-reply@google.com>: Only create column annotation for timestamp precision when the target type is also a timestamp -- Change by Shannon Bales <nbales@google.com>: No public description -- Change by Divyanshu Ranjan <divyanshur@google.com>: Change signature of `TableValuedFunction::CreateEvaluator` to remove const requirement on TvfEvaluatorArg. -- Change by ZetaSQL Team <no-reply@google.com>: RQG support for generating WHERE filter in aggregate function calls. -- Change by ZetaSQL Team <no-reply@google.com>: Updated the ReferenceDriver to support executing DDLs by default when not used as a reference. GitOrigin-RevId: 5bd0300a28d6d86e224393b084a373f7e55862a8 Change-Id: Ifae8e9284316d42639bfb42736438a5c4b597ecd
1 parent 92310e4 commit 73693a9

97 files changed

Lines changed: 13960 additions & 10649 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/data-definition-language.md

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -886,7 +886,69 @@ Documentation is pending for this feature.
886886

887887
## `CREATE PROCEDURE`
888888

889-
Documentation is pending for this feature.
889+
<pre>
890+
CREATE
891+
[OR REPLACE]
892+
[{ TEMP[ORARY] | PUBLIC | PRIVATE }]
893+
PROCEDURE
894+
[IF NOT EXISTS]
895+
procedure_name
896+
( [ <span class="var">parameter_definition</span> [, ...] ] )
897+
[OPTIONS (key=value, ...)]
898+
BEGIN
899+
<span class="var">sql_statement_list</span>
900+
END;
901+
902+
<span class="var">parameter_definition:</span>
903+
[ <span class="var">mode</span> ] parameter_name type
904+
905+
<span class="var">mode:</span>
906+
{ IN | OUT | INOUT }
907+
</pre>
908+
909+
**Description**
910+
911+
The `CREATE PROCEDURE` statement creates a procedure. A procedure is a reusable
912+
block of SQL statements that can be invoked by name from other queries, and
913+
supports arguments.
914+
915+
**Optional Clauses**
916+
917+
+ `OR REPLACE`: Replaces any procedure with the same name if it exists. Can't
918+
appear with `IF NOT EXISTS`.
919+
+ `TEMP | TEMPORARY`: Creates a temporary procedure. The lifetime of the
920+
procedure is system specific.
921+
+ `PUBLIC`: If the procedure is declared in a module, `PUBLIC` specifies that
922+
it's available outside of the module.
923+
+ `PRIVATE`: If the procedure is declared in a module, `PRIVATE` specifies
924+
that it's only available inside of the module (default).
925+
+ `IF NOT EXISTS`: If any procedure exists with the same name, the `CREATE`
926+
statement has no effect. Can't appear with `OR REPLACE`.
927+
+ `parameter_definition`: Defines a parameter for the procedure.
928+
+ `mode`: The mode of the parameter. Can be `IN`, `OUT`, or `INOUT`.
929+
+ `IN`: The parameter is an input parameter.
930+
+ `OUT`: The parameter is an output parameter.
931+
+ `INOUT`: The parameter is both an input and an output parameter.
932+
+ `parameter_name`: The name of the parameter.
933+
+ `type`: The ZetaSQL data type of the parameter.
934+
+ `DEFAULT default_value`: The default value for the parameter.
935+
+ `OPTIONS`: If you have schema options, you can add them when you create the
936+
procedure. These options are system specific and follow the ZetaSQL[`HINT`
937+
syntax][hints].
938+
+ `BEGIN ... END`: The block of SQL statements that make up the procedure.
939+
940+
**Example**
941+
942+
```sql
943+
CREATE PROCEDURE my_procedure(IN x INT64, OUT y STRING)
944+
BEGIN
945+
IF x > 0 THEN
946+
SET y = 'positive';
947+
ELSE
948+
SET y = 'non-positive';
949+
END IF;
950+
END;
951+
```
890952

891953
## `CREATE ROW POLICY`
892954

docs/functions-and-operators.md

Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41495,6 +41495,14 @@ canonical equivalence.
4149541495
</td>
4149641496
</tr>
4149741497

41498+
<tr>
41499+
<td><a href="#regexp_extract_groups"><code>REGEXP_EXTRACT_GROUPS</code></a>
41500+
</td>
41501+
<td>
41502+
Produces substrings that match multiple capturing groups in a regular expression.
41503+
</td>
41504+
</tr>
41505+
4149841506
<tr>
4149941507
<td><a href="#regexp_extract_all"><code>REGEXP_EXTRACT_ALL</code></a>
4150041508
</td>
@@ -44078,6 +44086,9 @@ If the regular expression contains a capturing group (`(...)`), and there is a
4407844086
match for that capturing group, that match is returned. If there
4407944087
are multiple matches for a capturing group, the first match is returned.
4408044088

44089+
To extract matches for multiple capturing groups in a single call, use
44090+
[`REGEXP_EXTRACT_GROUPS`][regexp-extract-groups].
44091+
4408144092
If `position` is specified, the search starts at this
4408244093
position in `value`, otherwise it starts at the beginning of `value`. The
4408344094
`position` must be a positive integer and can't be 0. If `position` is greater
@@ -44173,6 +44184,276 @@ position, occurrence) AS regexp_value FROM example;
4417344184

4417444185
[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax
4417544186

44187+
[regexp-extract-groups]: #regexp_extract_groups
44188+
44189+
### `REGEXP_EXTRACT_GROUPS`
44190+
44191+
```zetasql
44192+
REGEXP_EXTRACT_GROUPS(value, regexp)
44193+
```
44194+
44195+
**Description**
44196+
44197+
Returns a `STRUCT` where each field contains a substring from `value` that
44198+
matches a capturing group in the [re2 regular expression][string-link-to-re2],
44199+
`regexp`. The function returns the substrings from the first place in `value`
44200+
where the *entire* `regexp` pattern matches.
44201+
44202+
**Details**
44203+
44204+
This function is similar to [`REGEXP_EXTRACT`][regexp-extract], but it returns a
44205+
`STRUCT` with a field for each capturing group in the `regexp`.
44206+
44207+
The `regexp` must contain at least one capturing group. The fields in the
44208+
returned `STRUCT` correspond to these capturing groups:
44209+
44210+
+ If a capturing group is named (for example, `(?<name>...)` or `(?P<name>...)`),
44211+
the corresponding `STRUCT` field will have that name. Both syntaxes are
44212+
equivalent.
44213+
+ If a capturing group is unnamed, the corresponding `STRUCT` field is
44214+
anonymous. These fields can be accessed by their 0-based position in the
44215+
`STRUCT`.
44216+
+ The order of fields in the `STRUCT` matches the order of the capturing
44217+
groups in `regexp` from left to right.
44218+
44219+
Returns `NULL` if `value` is `NULL` or if the overall `regexp` pattern doesn't
44220+
match at all. If a specific capturing group doesn't match (for example, if it's
44221+
part of an alternation or is optional), the corresponding `STRUCT` field is
44222+
`NULL`.
44223+
44224+
Returns an error if:
44225+
44226+
+ The `regexp` is invalid.
44227+
+ The `regexp` is not a string literal.
44228+
+ The `regexp` has no capturing groups.
44229+
+ A capturing group name is not a valid `STRUCT` field name (for example, starts
44230+
with a digit or contains spaces). Valid names consist of letters, numbers,
44231+
and underscores, and must start with a letter or underscore.
44232+
+ The same capturing group name is used more than once (case-insensitive).
44233+
44234+
**Return type**
44235+
44236+
`STRUCT<...>`
44237+
44238+
The fields of the `STRUCT` are generally `STRING` (or `BYTES` if the inputs are
44239+
`BYTES`). However, fields can be [auto-casted](#auto_casting) to other types.
44240+
44241+
**Examples**
44242+
44243+
Extract unnamed groups:
44244+
44245+
```zetasql
44246+
SELECT REGEXP_EXTRACT_GROUPS('abc123xyz', r'([a-z]+)([0-9]+)([a-z]+)') AS result
44247+
44248+
/*---------------------------------*
44249+
| result |
44250+
+---------------------------------+
44251+
| {abc, 123, xyz} |
44252+
*---------------------------------*/
44253+
```
44254+
44255+
Extract named groups:
44256+
44257+
```zetasql
44258+
SELECT REGEXP_EXTRACT_GROUPS('2025-09-10', r'(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})') AS result
44259+
44260+
/*----------------------------------------------*
44261+
| result |
44262+
+----------------------------------------------+
44263+
| {2025 year, 09 month, 10 day} |
44264+
*----------------------------------------------*/
44265+
```
44266+
44267+
**Expand STRUCT fields into columns**
44268+
44269+
Because `REGEXP_EXTRACT_GROUPS` returns a `STRUCT`, you can use the `.*` operator
44270+
in the `SELECT` list to expand the fields of the `STRUCT` into separate columns.
44271+
Expanding `STRUCT` fields into columns is particularly useful when all capturing
44272+
groups are named.
44273+
44274+
```zetasql
44275+
SELECT REGEXP_EXTRACT_GROUPS('PROD-WIDGET-1234', r'(?<env>\w+)-(?<product>\w+)-(?<id>\d+)').*
44276+
44277+
/*-------+-----------+------*
44278+
| env | product | id |
44279+
+-------+-----------+------+
44280+
| PROD | WIDGET | 1234 |
44281+
*-------+-----------+------*/
44282+
```
44283+
44284+
Mix of named and unnamed groups:
44285+
44286+
```zetasql
44287+
SELECT REGEXP_EXTRACT_GROUPS('id:123', r'(?<key>[a-z]+):([0-9]+)') AS result
44288+
44289+
/*-----------------------*
44290+
| result |
44291+
+-----------------------+
44292+
| {id key, 123} |
44293+
*-----------------------*/
44294+
```
44295+
44296+
No match returns `NULL`:
44297+
44298+
```zetasql
44299+
SELECT REGEXP_EXTRACT_GROUPS('abc', r'(\d+)') AS result
44300+
44301+
/*--------*
44302+
| result |
44303+
+--------+
44304+
| NULL |
44305+
*--------*/
44306+
```
44307+
44308+
Optional groups and empty matches:
44309+
44310+
```zetasql
44311+
WITH inputs AS (
44312+
SELECT 'id:123:extra' AS t UNION ALL
44313+
SELECT 'id:123:' AS t UNION ALL
44314+
SELECT 'id:123' AS t
44315+
)
44316+
SELECT
44317+
t,
44318+
REGEXP_EXTRACT_GROUPS(t, r'(?<key>\w+):(?<val>\w+)(?::(?<opt>\w*))?') AS result
44319+
FROM inputs;
44320+
44321+
/*-----------------+--------------------------------------*
44322+
| t | result |
44323+
+-----------------+--------------------------------------+
44324+
| id:123:extra | {id key, 123 val, extra opt} |
44325+
| id:123: | {id key, 123 val, opt} |
44326+
| id:123 | {id key, 123 val, NULL opt} |
44327+
*-----------------+--------------------------------------*/
44328+
```
44329+
44330+
Note that in the second row, the optional group `opt` matches an empty string,
44331+
which is different from the third row where the group doesn't match at all and
44332+
results in `NULL`.
44333+
44334+
Nested groups:
44335+
44336+
```zetasql
44337+
SELECT REGEXP_EXTRACT_GROUPS('a=b=c', r'(\w+)=((\w+)=\w+)') AS result
44338+
44339+
/*-----------------------*
44340+
| result |
44341+
+-----------------------+
44342+
| {a, b=c, b} |
44343+
*-----------------------*/
44344+
```
44345+
44346+
Alternation with different groups:
44347+
44348+
```zetasql
44349+
WITH inputs AS (
44350+
SELECT 'config_id=123' AS t UNION ALL
44351+
SELECT 'option_name=ABC' AS t
44352+
)
44353+
SELECT
44354+
t,
44355+
REGEXP_EXTRACT_GROUPS(t, r'config_id=(?<id>\d+)|option_name=(?<name>\w+)') AS result
44356+
FROM inputs;
44357+
44358+
/*-----------------+--------------------------*
44359+
| t | result |
44360+
+-----------------+--------------------------+
44361+
| config_id=123 | {123 id, NULL name} |
44362+
| option_name=ABC | {NULL id, ABC name} |
44363+
*-----------------+--------------------------*/
44364+
```
44365+
44366+
The `STRUCT` result contains fields for all named capturing groups across all
44367+
alternatives in the regular expression. In each row, only the fields
44368+
corresponding to the alternative that matched are populated. Other fields are
44369+
`NULL`.
44370+
44371+
##### Auto-casting
44372+
<a id="auto_casting"></a>
44373+
44374+
You can automatically cast the captured substring to a specific type by
44375+
suffixing the capturing group name with a double underscore (`__`) followed by
44376+
the type name.
44377+
44378+
Any type that can be cast from `STRING` (or `BYTES` for the `BYTES` version
44379+
of the function) is supported. Type names are case-insensitive.
44380+
44381+
The field name in the resulting `STRUCT` will have the `__TYPE` suffix removed.
44382+
44383+
If the captured substring can't be cast to the specified type, an error is
44384+
returned. This includes casting an empty string to a numeric or boolean type.
44385+
If the captured substring is `NULL` (due to an optional group not matching), the
44386+
cast result is also `NULL`.
44387+
44388+
**Examples of auto-casting**
44389+
44390+
```zetasql
44391+
SELECT REGEXP_EXTRACT_GROUPS('val=0x1a', r'val=(?<val__INT64>0x[0-9a-fA-F]+)') AS result
44392+
44393+
/*-------------*
44394+
| result |
44395+
+-------------+
44396+
| {26 val} |
44397+
*-------------*/
44398+
```
44399+
44400+
Auto-casted values in expressions with Pipe syntax:
44401+
44402+
```zetasql
44403+
FROM UNNEST(['02:30:10', '01:02:03']) AS time_str
44404+
|> EXTEND REGEXP_EXTRACT_GROUPS(time_str, r'(?<h__INT64>\d{2}):(?<m__INT64>\d{2}):(?<s__INT64>\d{2})').*
44405+
|> SELECT time_str, h * 3600 + m * 60 + s AS total_seconds
44406+
44407+
/*----------+---------------*
44408+
| time_str | total_seconds |
44409+
+----------+---------------+
44410+
| 02:30:10 | 9010 |
44411+
| 01:02:03 | 3723 |
44412+
*----------+---------------*/
44413+
```
44414+
44415+
Expand auto-casted fields into columns:
44416+
44417+
```zetasql
44418+
SELECT REGEXP_EXTRACT_GROUPS('2025-09-10', r'(?<year__INT64>\d{4})-(?<month__INT64>\d{2})-(?<day__INT64>\d{2})').*
44419+
44420+
/*--------+---------+-------*
44421+
| year | month | day |
44422+
+--------+---------+-------+
44423+
| 2025 | 9 | 10 |
44424+
*--------+---------+-------*/
44425+
```
44426+
44427+
Cast failure:
44428+
44429+
```zetasql {.bad}
44430+
-- Error: Bad INT64 value
44431+
SELECT REGEXP_EXTRACT_GROUPS('ID: ABC', r'ID: (?<item_id__INT64>\w+)')
44432+
```
44433+
44434+
Cast failure with empty string:
44435+
44436+
```zetasql {.bad}
44437+
-- Error: Bad INT64 value
44438+
SELECT REGEXP_EXTRACT_GROUPS('ID: ', r'ID: (?<item_id__INT64>\d*)')
44439+
```
44440+
44441+
Workaround for empty string cast failure by making the group optional:
44442+
44443+
```zetasql
44444+
SELECT REGEXP_EXTRACT_GROUPS('ID: ', r'ID: (?<item_id__INT64>\d+)?') AS result
44445+
44446+
/*-----------------*
44447+
| result |
44448+
+-----------------+
44449+
| {NULL item_id} |
44450+
*-----------------*/
44451+
```
44452+
44453+
[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax
44454+
44455+
[regexp-extract]: #regexp_extract
44456+
4417644457
### `REGEXP_EXTRACT_ALL`
4417744458

4417844459
```zetasql

docs/graph-gql-functions.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -967,10 +967,13 @@ RETURN src.id as src_id, num_transfers, unique_amount_transfers, dst.id AS desti
967967
+---------------------------------------------------------------------------*/
968968
```
969969

970-
In the following query, the `SUM` function takes a group variable called
971-
`e` that represents an array of transfers, and then sums the amount
972-
for each transfer. Note that horizontal aggregation isn't allowed in the
973-
`RETURN` statement: that `ARRAY_AGG` is an aggregate over the result set.
970+
In the following query, the `SUM` function takes a group variable called `e`
971+
that represents an array of transfers, and then sums the amount for each
972+
transfer. Horizontal aggregation isn't allowed in the `RETURN`
973+
statement. `ARRAY_AGG` is a vertical aggregate over the result set, which is
974+
grouped implicitly by the non-aggregated columns
975+
(`source_account_id`, `destination_account_id`). `ARRAY_AGG` produces one row
976+
for each distinct destination account.
974977

975978
```zetasql
976979
GRAPH FinGraph

0 commit comments

Comments
 (0)