|
1 | 1 | ## ZetaSQL - Analyzer Framework for SQL |
2 | 2 |
|
3 | | -ZetaSQL defines a language (grammar, types, data model, and semantics) as well |
4 | | -as a parser and analyzer. It is not itself a database or query engine. Instead |
5 | | -it is intended to be used by multiple engines wanting to provide consistent |
6 | | -behavior for all semantic analysis, name resolution, type checking, implicit |
7 | | -casting, etc. Specific query engines may not implement all features in the |
8 | | -ZetaSQL language and may give errors if specific features are not supported. For |
9 | | -example, engine A may not support any updates and engine B may not support |
10 | | -analytic functions. |
11 | | - |
12 | | -[ZetaSQL Language Guide](docs/README.md) |
13 | | - |
14 | | -[ZetaSQL ResolvedAST API](docs/resolved_ast.md) |
15 | | - |
16 | | -[ZetaSQL BigQuery Analysis Example](https://github.com/GoogleCloudPlatform/professional-services/tree/main/tools/zetasql-helper) |
17 | | - |
18 | | -## Status of Project and Roadmap |
19 | | - |
20 | | -This codebase is being open sourced in multiple phases: |
21 | | - |
22 | | -1. Parser and Analyzer **Complete** |
23 | | -2. Reference Implementation **In Progress** |
24 | | - - Base capability **Complete** |
25 | | - - Function library **In Progress** |
26 | | -3. Compliance Tests **Complete** |
27 | | - - includes framework for validating compliance of arbitrary engines |
28 | | -4. Misc tooling |
29 | | - - Improved Formatter **Complete** |
| 3 | +ZetaSQL defines a SQL language (grammar, types, data model, semantics, and |
| 4 | +function library) and |
| 5 | +implements parsing and analysis for that language as a reusable component. |
| 6 | +ZetaSQL is not itself a database or query engine. Instead, |
| 7 | +it's intended to be used by multiple engines, to provide consistent |
| 8 | +language and behavior (name resolution, type checking, implicit |
| 9 | +casting, etc.). Specific query engines may implement a subset of features, |
| 10 | +giving errors for unuspported features. |
| 11 | +ZetaSQL's compliance test suite can be used to validate query engine |
| 12 | +implementations are correct and consistent. |
| 13 | + |
| 14 | +ZetaSQL implements the ZetaSQL language, which is used across several of |
| 15 | +Google's SQL products, both publicly and internally, including BigQuery, |
| 16 | +Spanner, F1, BigTable, Dremel, Procella, and others. |
| 17 | + |
| 18 | +ZetaSQL and ZetaSQL have been described in these publications: |
| 19 | + |
| 20 | +* (CDMS 2022) [ZetaSQL: A SQL Language as a Component](https://cdmsworkshop.github.io/2022/Slides/Fri_C2.5_DavidWilhite.pptx) (Slides) |
| 21 | +* (SIGMOD 2017) [Spanner: Becoming a SQL System](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46103.pdf) -- See section 6. |
| 22 | +* (VLDB 2024) [SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL](https://research.google/pubs/pub1005959/) -- Describes ZetaSQL's new pipe query syntax. |
| 23 | + |
| 24 | +Some other documentation: |
| 25 | + |
| 26 | +* [ZetaSQL Language Reference](docs/README.md) |
| 27 | +* [ZetaSQL Resolved AST](docs/resolved_ast.md), documenting the intermediate representation produced by the ZetaSQL analyzer. |
| 28 | +* [ZetaSQL Toolkit](https://github.com/GoogleCloudPlatform/zetasql-toolkit), a project using ZetaSQL to analyze and understand queries against BigQuery, and other ZetaSQL engines. |
| 29 | + |
| 30 | +## Project Overview |
| 31 | + |
| 32 | +The main components and APIs are in these directories under `zetasql/`: |
| 33 | + |
| 34 | +* `zetasql/public`: Most public APIs are here. |
| 35 | +* `zetasql/resolved_ast`: Defines the [Resolved AST](docs/resolved_ast.md), which the analyzer produces. |
| 36 | +* `zetasql/parser`: The grammar and parser implementation. (Semi-public, since the parse trees are not a stable API.) |
| 37 | +* `zetasql/analyzer`: The internal implementation of query analysis. |
| 38 | +* `zetasql/reference_impl`: The reference implementation for executing queries. |
| 39 | +* `zetasql/compliance`: Compliance test framework and compliance tests. |
| 40 | +* `zetasql/public/functions`: Function implementations for engines to use. |
| 41 | +* `zetasql/tools/execute_query`: Interactive query execution for debugging. |
| 42 | +* `zetasql/java/com/google/zetasql`: Java APIs, implemented by calling a local RPC server. |
30 | 43 |
|
31 | 44 | Multiplatform support is planned for the following platforms: |
32 | 45 |
|
33 | 46 | - Linux (Ubuntu 20.04 is our reference platform, but others may work). |
34 | 47 | - gcc-9+ is required, recent versions of clang may work. |
35 | 48 | - MacOS (Experimental) |
36 | | - - Windows (version TDB) |
37 | 49 |
|
38 | 50 | We do not provide any guarantees of API stability and *cannot accept |
39 | 51 | contributions*. |
40 | 52 |
|
| 53 | +## Running Queries with `execute_query` |
| 54 | + |
| 55 | +The `execute_query` tool can parse, analyze and run SQL |
| 56 | +queries using the reference implementation. |
| 57 | + |
| 58 | +See [Execute Query](execute_query.md) for more details on using the tool. |
| 59 | + |
| 60 | +You can run it using binaries from |
| 61 | +[Releases](https://github.com/google/zetasql/releases), or build it using the |
| 62 | +instructions below. |
| 63 | + |
| 64 | +There are some runnable example queries in |
| 65 | +[tpch examples](../zetasql/examples/tpch/README.md). |
| 66 | + |
| 67 | +### Getting and Running `execute_query` |
| 68 | +#### Pre-built Binaries |
| 69 | + |
| 70 | +ZetaSQL provides pre-built binaries for `execute_query` for Linux and MacOS on |
| 71 | +the [Releases](https://github.com/google/zetasql/releases) page. You can run |
| 72 | +the downloaded binary like: |
| 73 | + |
| 74 | +```bash |
| 75 | +./execute_query_linux --web |
| 76 | +``` |
| 77 | + |
| 78 | +Note the prebuilt binaries require GCC-9+ and tzdata. If you run into dependency |
| 79 | +issues, you can try running `execute_query` with Docker. See the |
| 80 | +[Run with Docker](#run-with-docker) section. |
| 81 | + |
| 82 | +#### Running from a bazel build |
| 83 | + |
| 84 | +You can build `execute_query` with Bazel from source and run it by: |
| 85 | + |
| 86 | +```bash |
| 87 | +bazel run zetasql/tools/execute_query:execute_query -- --web |
| 88 | +``` |
| 89 | + |
| 90 | +#### Run with Docker |
| 91 | + |
| 92 | +You can run `execute_query` using Docker. First download the pre-built Docker |
| 93 | +image `zetasql` or build your own from Dockerfile. See the instructions in the |
| 94 | +[Build With Docker](#build-with-docker) section. |
| 95 | + |
| 96 | +Assuming your Docker image name is MyZetaSQLImage, run: |
41 | 97 |
|
42 | | -## Flags |
43 | | -ZetaSQL uses the Abseil [Flags](https://abseil.io/blog/20190509-flags) library |
44 | | -to handle commandline flags. Unless otherwise documented, all flags are for |
45 | | -debugging purposes only and may change, stop working or be removed at any time. |
| 98 | +```bash |
| 99 | +sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage execute_query --web |
| 100 | +``` |
46 | 101 |
|
| 102 | +Argument descriptions: |
| 103 | + |
| 104 | +* `--init`: Allows `execute_query` to handle signals properly. |
| 105 | +* `-it`: Runs the container in interactive mode. |
| 106 | +* `-h=$(hostname)`: Makes the hostname of the container the same as that of the |
| 107 | + host. |
| 108 | +* `-p 8080:8080`: Sets up port forwarding. |
| 109 | + |
| 110 | +`-h=$(hostname)` and `-p 8080:8080` together make the URL address of the |
| 111 | +web server accessible from the host machine. |
| 112 | + |
| 113 | +Alternatively, you can run this to start a bash shell, and then run |
| 114 | +`execute_query` inside: |
| 115 | + |
| 116 | +```bash |
| 117 | +sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage |
| 118 | + |
| 119 | +# Inside the container bash shell |
| 120 | +execute_query --web |
| 121 | +``` |
47 | 122 |
|
48 | 123 | ## How to Build |
49 | 124 |
|
50 | | -ZetaSQL uses [bazel](https://bazel.build) for building and dependency |
51 | | -resolution. After installing bazel (check .bazelversion for the specific version |
52 | | -of bazel we test with, but other versions may work), simply run: |
| 125 | +### Build with Bazel |
| 126 | + |
| 127 | +ZetaSQL uses [Bazel](https://bazel.build) for building and dependency |
| 128 | +resolution. Instructions for installing Bazel can be found in |
| 129 | +https://bazel.build/install. The Bazel version that ZetaSQL uses is specified in |
| 130 | +the `.bazelversion` file. |
| 131 | + |
| 132 | +Besides Bazel, the following dependencies are also needed: |
| 133 | + |
| 134 | +* GCC-9+ or equivalent Clang |
| 135 | +* tzdata |
| 136 | + |
| 137 | +`tzdata` provides the support for time zone information. It is generally |
| 138 | +available on MacOS. If you run Linux and it is not pre-installed, you can |
| 139 | +install it with `apt-get install tzdata`. |
| 140 | + |
| 141 | +Once the dependencies are installed, you can build or run ZetaSQL targets as |
| 142 | +needed, for example: |
| 143 | + |
| 144 | +```bash |
| 145 | +# Build everything. |
| 146 | +bazel build ... |
| 147 | + |
| 148 | +# Build and run the execute_query tool. |
| 149 | +bazel run //zetasql/tools/execute_query:execute_query -- --web |
| 150 | + |
| 151 | +# The built binary can be found under bazel-bin and run directly. |
| 152 | +bazel-bin/tools/execute_query:execute_query --web |
| 153 | + |
| 154 | +# Build and run a test. |
| 155 | +bazel test //zetasql/parser:parser_set_test |
| 156 | +``` |
| 157 | + |
| 158 | +Some Mac users may experience build issues due to the Python error |
| 159 | +`ModuleNotFoundError: no module named 'google.protobuf'`. To resolve it, run |
| 160 | +`pip install protobuf==<version>` to install python protobuf. The protobuf |
| 161 | +version can be found in the `zetasql_deps_step_2.bzl` file. |
| 162 | + |
| 163 | +### Build with Docker |
| 164 | + |
| 165 | +ZetaSQL also provides a `Dockerfile` which configures all the dependencies so |
| 166 | +that users can build ZetaSQL more easily across different platforms. |
| 167 | + |
| 168 | +To build the Docker image locally (called MyZetaSQLImage here), run: |
| 169 | + |
| 170 | +```bash |
| 171 | +sudo docker build . -t MyZetaSQLImage -f Dockerfile |
| 172 | +``` |
53 | 173 |
|
54 | | -```bazel build ...``` |
| 174 | +Alternatively, ZetaSQL provides pre-built Docker images named `zetasql`. See the |
| 175 | +[Releases](https://github.com/google/zetasql/releases) page. You can load the |
| 176 | +downloaded image by: |
55 | 177 |
|
56 | | -If your Mac build fails due the python error |
57 | | - `ModuleNotFoundError: no module named 'google.protobuf'`, run |
58 | | - `pip install protobuf==<version>` to install python protobuf first. The |
59 | | - protobuf version can be found in the zetasql_deps_step_2.bzl file. |
| 178 | +```bash |
| 179 | +sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar |
| 180 | +``` |
60 | 181 |
|
61 | | -## How to add as a Dependency in bazel |
62 | | -See the (WORKSPACE) file, as it is a little unusual. |
| 182 | +To run builds or other commands inside the Docker environment, run this command |
| 183 | +to open a bash shell inside the container: |
63 | 184 |
|
64 | | -### With docker |
65 | | - TODO: Add docker build instructions. |
| 185 | +```bash |
| 186 | +# Start a bash shell running inside the Docker container. |
| 187 | +sudo docker run -it MyZetaSQLImage |
| 188 | +``` |
66 | 189 |
|
67 | | -## Example Usage |
68 | | -A very basic command line tool is available to run simple queries with the |
69 | | -reference implementation: |
70 | | -```bazel run //zetasql/tools/execute_query:execute_query -- "select 1 + 1;"``` |
| 190 | +Then you can run the commands from the [Build with Bazel](#build-with-bazel) |
| 191 | +section above. |
71 | 192 |
|
72 | | -The reference implementation is not yet completely released and currently |
73 | | -supports only a subset of functions and types. |
74 | 193 |
|
75 | 194 | ## Differential Privacy |
76 | | -For questions, documentation and examples of ZetaSQLs implementation of |
| 195 | +For questions, documentation, and examples of ZetaSQL's implementation of |
77 | 196 | Differential Privacy, please check out |
78 | 197 | (https://github.com/google/differential-privacy). |
79 | 198 |
|
|
0 commit comments