Skip to content

Commit f30c319

Browse files
ZetaSQL TeamKimiWaRokkuWoKikanai
authored andcommitted
Exported ZetaSQL changes.
- Added the support for [SQL pipe syntax](https://research.google/pubs/pub1005959/) - Improved the `execute_query` with an interactive web UI and more functionality. - Added new and improved SQL language features. - Improved documentation. GitOrigin-RevId: 88446a33c3a4498dab3f5cf2a1fe92c9f56d9723 Change-Id: Ia8ba13c3131dfc37b8ca9c60c9fd92752eac941d
1 parent f6df697 commit f30c319

554 files changed

Lines changed: 95737 additions & 16427 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.bazelrc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,9 @@ build:g++ --cxxopt=-Wno-class-memaccess
5151
build:g++ --cxxopt=-Wno-deprecated-declarations
5252
# For string_fortified
5353
build:g++ --cxxopt=-Wno-stringop-truncation
54+
55+
# C++17 is required to build ZetaSQL, hence `-cxxopt=-std=c++17`. On MacOS
56+
# `--host_cxxopt=-std=c++17` is also needed.
57+
build --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
58+
run --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
59+
test --cxxopt=-std=c++17 --host_cxxopt=-std=c++17

.bazelversion

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
6.2.0
1+
6.5.0

Dockerfile

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ RUN apt-get update && apt-get -qq install -y default-jre default-jdk
1111
RUN apt-get update && apt-get -qq install curl tar build-essential wget \
1212
python python3 zip unzip
1313

14-
ENV BAZEL_VERSION=6.2.0
14+
ENV BAZEL_VERSION=6.5.0
1515

1616
# Install bazel from source
1717
RUN mkdir -p bazel && \
@@ -38,10 +38,31 @@ RUN add-apt-repository ppa:ubuntu-toolchain-r/test && \
3838
--slave /usr/bin/g++ g++ /usr/bin/g++-11 && \
3939
update-alternatives --set gcc /usr/bin/gcc-11
4040

41+
42+
# To support fileNames with non-ascii characters
43+
RUN apt-get -qq install locales && locale-gen en_US.UTF-8
44+
ENV LANG en_US.UTF-8
45+
4146
COPY . /zetasql
4247

48+
# Create a new user zetasql to avoid running as root.
49+
RUN useradd -ms /bin/bash zetasql
50+
RUN chown -R zetasql:zetasql /zetasql
51+
USER zetasql
52+
4353
ENV BAZEL_ARGS="--config=g++"
4454

55+
# Pre-build the binary for execute_query so that users can try out zetasql
56+
# directly. Users can modify the target in the docker file or enter the
57+
# container and build other targets as needed.
4558
RUN cd zetasql && \
4659
CC=/usr/bin/gcc CXX=/usr/bin/g++ \
47-
bazel build ${BAZEL_ARGS} ...
60+
bazel build ${BAZEL_ARGS} -c opt //zetasql/tools/execute_query:execute_query
61+
62+
# Create a shortcut for execute_query.
63+
ENV HOME=/home/zetasql
64+
RUN mkdir -p $HOME/bin
65+
RUN ln -s /zetasql/bazel-bin/zetasql/tools/execute_query/execute_query $HOME/bin/execute_query
66+
ENV PATH=$PATH:$HOME/bin
67+
68+
WORKDIR /zetasql

README.md

Lines changed: 170 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,79 +1,198 @@
11
## ZetaSQL - Analyzer Framework for SQL
22

3-
ZetaSQL defines a language (grammar, types, data model, and semantics) as well
4-
as a parser and analyzer. It is not itself a database or query engine. Instead
5-
it is intended to be used by multiple engines wanting to provide consistent
6-
behavior for all semantic analysis, name resolution, type checking, implicit
7-
casting, etc. Specific query engines may not implement all features in the
8-
ZetaSQL language and may give errors if specific features are not supported. For
9-
example, engine A may not support any updates and engine B may not support
10-
analytic functions.
11-
12-
[ZetaSQL Language Guide](docs/README.md)
13-
14-
[ZetaSQL ResolvedAST API](docs/resolved_ast.md)
15-
16-
[ZetaSQL BigQuery Analysis Example](https://github.com/GoogleCloudPlatform/professional-services/tree/main/tools/zetasql-helper)
17-
18-
## Status of Project and Roadmap
19-
20-
This codebase is being open sourced in multiple phases:
21-
22-
1. Parser and Analyzer **Complete**
23-
2. Reference Implementation **In Progress**
24-
- Base capability **Complete**
25-
- Function library **In Progress**
26-
3. Compliance Tests **Complete**
27-
- includes framework for validating compliance of arbitrary engines
28-
4. Misc tooling
29-
- Improved Formatter **Complete**
3+
ZetaSQL defines a SQL language (grammar, types, data model, semantics, and
4+
function library) and
5+
implements parsing and analysis for that language as a reusable component.
6+
ZetaSQL is not itself a database or query engine. Instead,
7+
it's intended to be used by multiple engines, to provide consistent
8+
language and behavior (name resolution, type checking, implicit
9+
casting, etc.). Specific query engines may implement a subset of features,
10+
giving errors for unuspported features.
11+
ZetaSQL's compliance test suite can be used to validate query engine
12+
implementations are correct and consistent.
13+
14+
ZetaSQL implements the ZetaSQL language, which is used across several of
15+
Google's SQL products, both publicly and internally, including BigQuery,
16+
Spanner, F1, BigTable, Dremel, Procella, and others.
17+
18+
ZetaSQL and ZetaSQL have been described in these publications:
19+
20+
* (CDMS 2022) [ZetaSQL: A SQL Language as a Component](https://cdmsworkshop.github.io/2022/Slides/Fri_C2.5_DavidWilhite.pptx) (Slides)
21+
* (SIGMOD 2017) [Spanner: Becoming a SQL System](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46103.pdf) -- See section 6.
22+
* (VLDB 2024) [SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL](https://research.google/pubs/pub1005959/) -- Describes ZetaSQL's new pipe query syntax.
23+
24+
Some other documentation:
25+
26+
* [ZetaSQL Language Reference](docs/README.md)
27+
* [ZetaSQL Resolved AST](docs/resolved_ast.md), documenting the intermediate representation produced by the ZetaSQL analyzer.
28+
* [ZetaSQL Toolkit](https://github.com/GoogleCloudPlatform/zetasql-toolkit), a project using ZetaSQL to analyze and understand queries against BigQuery, and other ZetaSQL engines.
29+
30+
## Project Overview
31+
32+
The main components and APIs are in these directories under `zetasql/`:
33+
34+
* `zetasql/public`: Most public APIs are here.
35+
* `zetasql/resolved_ast`: Defines the [Resolved AST](docs/resolved_ast.md), which the analyzer produces.
36+
* `zetasql/parser`: The grammar and parser implementation. (Semi-public, since the parse trees are not a stable API.)
37+
* `zetasql/analyzer`: The internal implementation of query analysis.
38+
* `zetasql/reference_impl`: The reference implementation for executing queries.
39+
* `zetasql/compliance`: Compliance test framework and compliance tests.
40+
* `zetasql/public/functions`: Function implementations for engines to use.
41+
* `zetasql/tools/execute_query`: Interactive query execution for debugging.
42+
* `zetasql/java/com/google/zetasql`: Java APIs, implemented by calling a local RPC server.
3043

3144
Multiplatform support is planned for the following platforms:
3245

3346
- Linux (Ubuntu 20.04 is our reference platform, but others may work).
3447
- gcc-9+ is required, recent versions of clang may work.
3548
- MacOS (Experimental)
36-
- Windows (version TDB)
3749

3850
We do not provide any guarantees of API stability and *cannot accept
3951
contributions*.
4052

53+
## Running Queries with `execute_query`
54+
55+
The `execute_query` tool can parse, analyze and run SQL
56+
queries using the reference implementation.
57+
58+
See [Execute Query](execute_query.md) for more details on using the tool.
59+
60+
You can run it using binaries from
61+
[Releases](https://github.com/google/zetasql/releases), or build it using the
62+
instructions below.
63+
64+
There are some runnable example queries in
65+
[tpch examples](../zetasql/examples/tpch/README.md).
66+
67+
### Getting and Running `execute_query`
68+
#### Pre-built Binaries
69+
70+
ZetaSQL provides pre-built binaries for `execute_query` for Linux and MacOS on
71+
the [Releases](https://github.com/google/zetasql/releases) page. You can run
72+
the downloaded binary like:
73+
74+
```bash
75+
./execute_query_linux --web
76+
```
77+
78+
Note the prebuilt binaries require GCC-9+ and tzdata. If you run into dependency
79+
issues, you can try running `execute_query` with Docker. See the
80+
[Run with Docker](#run-with-docker) section.
81+
82+
#### Running from a bazel build
83+
84+
You can build `execute_query` with Bazel from source and run it by:
85+
86+
```bash
87+
bazel run zetasql/tools/execute_query:execute_query -- --web
88+
```
89+
90+
#### Run with Docker
91+
92+
You can run `execute_query` using Docker. First download the pre-built Docker
93+
image `zetasql` or build your own from Dockerfile. See the instructions in the
94+
[Build With Docker](#build-with-docker) section.
95+
96+
Assuming your Docker image name is MyZetaSQLImage, run:
4197

42-
## Flags
43-
ZetaSQL uses the Abseil [Flags](https://abseil.io/blog/20190509-flags) library
44-
to handle commandline flags. Unless otherwise documented, all flags are for
45-
debugging purposes only and may change, stop working or be removed at any time.
98+
```bash
99+
sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage execute_query --web
100+
```
46101

102+
Argument descriptions:
103+
104+
* `--init`: Allows `execute_query` to handle signals properly.
105+
* `-it`: Runs the container in interactive mode.
106+
* `-h=$(hostname)`: Makes the hostname of the container the same as that of the
107+
host.
108+
* `-p 8080:8080`: Sets up port forwarding.
109+
110+
`-h=$(hostname)` and `-p 8080:8080` together make the URL address of the
111+
web server accessible from the host machine.
112+
113+
Alternatively, you can run this to start a bash shell, and then run
114+
`execute_query` inside:
115+
116+
```bash
117+
sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage
118+
119+
# Inside the container bash shell
120+
execute_query --web
121+
```
47122

48123
## How to Build
49124

50-
ZetaSQL uses [bazel](https://bazel.build) for building and dependency
51-
resolution. After installing bazel (check .bazelversion for the specific version
52-
of bazel we test with, but other versions may work), simply run:
125+
### Build with Bazel
126+
127+
ZetaSQL uses [Bazel](https://bazel.build) for building and dependency
128+
resolution. Instructions for installing Bazel can be found in
129+
https://bazel.build/install. The Bazel version that ZetaSQL uses is specified in
130+
the `.bazelversion` file.
131+
132+
Besides Bazel, the following dependencies are also needed:
133+
134+
* GCC-9+ or equivalent Clang
135+
* tzdata
136+
137+
`tzdata` provides the support for time zone information. It is generally
138+
available on MacOS. If you run Linux and it is not pre-installed, you can
139+
install it with `apt-get install tzdata`.
140+
141+
Once the dependencies are installed, you can build or run ZetaSQL targets as
142+
needed, for example:
143+
144+
```bash
145+
# Build everything.
146+
bazel build ...
147+
148+
# Build and run the execute_query tool.
149+
bazel run //zetasql/tools/execute_query:execute_query -- --web
150+
151+
# The built binary can be found under bazel-bin and run directly.
152+
bazel-bin/tools/execute_query:execute_query --web
153+
154+
# Build and run a test.
155+
bazel test //zetasql/parser:parser_set_test
156+
```
157+
158+
Some Mac users may experience build issues due to the Python error
159+
`ModuleNotFoundError: no module named 'google.protobuf'`. To resolve it, run
160+
`pip install protobuf==<version>` to install python protobuf. The protobuf
161+
version can be found in the `zetasql_deps_step_2.bzl` file.
162+
163+
### Build with Docker
164+
165+
ZetaSQL also provides a `Dockerfile` which configures all the dependencies so
166+
that users can build ZetaSQL more easily across different platforms.
167+
168+
To build the Docker image locally (called MyZetaSQLImage here), run:
169+
170+
```bash
171+
sudo docker build . -t MyZetaSQLImage -f Dockerfile
172+
```
53173

54-
```bazel build ...```
174+
Alternatively, ZetaSQL provides pre-built Docker images named `zetasql`. See the
175+
[Releases](https://github.com/google/zetasql/releases) page. You can load the
176+
downloaded image by:
55177

56-
If your Mac build fails due the python error
57-
`ModuleNotFoundError: no module named 'google.protobuf'`, run
58-
`pip install protobuf==<version>` to install python protobuf first. The
59-
protobuf version can be found in the zetasql_deps_step_2.bzl file.
178+
```bash
179+
sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar
180+
```
60181

61-
## How to add as a Dependency in bazel
62-
See the (WORKSPACE) file, as it is a little unusual.
182+
To run builds or other commands inside the Docker environment, run this command
183+
to open a bash shell inside the container:
63184

64-
### With docker
65-
TODO: Add docker build instructions.
185+
```bash
186+
# Start a bash shell running inside the Docker container.
187+
sudo docker run -it MyZetaSQLImage
188+
```
66189

67-
## Example Usage
68-
A very basic command line tool is available to run simple queries with the
69-
reference implementation:
70-
```bazel run //zetasql/tools/execute_query:execute_query -- "select 1 + 1;"```
190+
Then you can run the commands from the [Build with Bazel](#build-with-bazel)
191+
section above.
71192

72-
The reference implementation is not yet completely released and currently
73-
supports only a subset of functions and types.
74193

75194
## Differential Privacy
76-
For questions, documentation and examples of ZetaSQLs implementation of
195+
For questions, documentation, and examples of ZetaSQL's implementation of
77196
Differential Privacy, please check out
78197
(https://github.com/google/differential-privacy).
79198

bazel/boost.BUILD

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Copyright 2024 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
#
15+
16+
licenses(["notice"]) # Apache v2.0
17+
18+
19+
load("@rules_foreign_cc//foreign_cc:defs.bzl", "boost_build")
20+
21+
filegroup(
22+
name = "all_srcs",
23+
srcs = glob(["**"]),
24+
visibility = ["//visibility:private"],
25+
)
26+
27+
boost_build(
28+
name = "boost",
29+
bootstrap_options = ["--without-icu"],
30+
lib_source = ":all_srcs",
31+
out_static_libs = select({
32+
"//conditions:default": [
33+
"libboost_atomic.a",
34+
"libboost_filesystem.a",
35+
"libboost_program_options.a",
36+
"libboost_regex.a",
37+
"libboost_system.a",
38+
"libboost_thread.a",
39+
],
40+
}),
41+
user_options = [
42+
"-j4",
43+
"--with-filesystem",
44+
"--with-program_options",
45+
"--with-regex",
46+
"--with-system",
47+
"--with-thread",
48+
"variant=release",
49+
"link=static",
50+
"threading=multi",
51+
],
52+
visibility = ["//visibility:public"],
53+
)

0 commit comments

Comments
 (0)