Skip to content

Actualize benchmarks docs #19219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ydb/docs/en/core/changelog-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Released on June 4, 2025. To update to version **2.22.0**, select the [Downloads
### Features

* Added scheme object names completion in interactive mode.
* Enhanced the capabilities of the `{{ ydb-cli }} workload query` command: added `{{ ydb-cli }} workload query init`, `{{ ydb-cli }} workload query import`, and `{{ ydb-cli }} workload query clean` commands, and modified the `{{ ydb-cli }} workload query run` command. Using these commands, you can initialize tables, populate them with data, perform load testing, and clean up the data afterwards.
* Enhanced the capabilities of the `{{ ydb-cli }} workload query` [command](./reference/ydb-cli/workload-query.md): added `{{ ydb-cli }} workload query init`, `{{ ydb-cli }} workload query import`, and `{{ ydb-cli }} workload query clean` commands, and modified the `{{ ydb-cli }} workload query run` command. Using these commands, you can initialize tables, populate them with data, perform load testing, and clean up the data afterwards.
* Added the `--threads` option to the `{{ ydb-cli }} workload clickbench run`, `{{ ydb-cli }} workload tpch run`, and `{{ ydb-cli }} workload tpcds run` [commands](./reference/ydb-cli/workload-click-bench.md). This option allows to specify the number of threads sending the queries.
* **_(Requires server v25.1+)_** **_(Experimental)_** Added the `{{ ydb-cli }} admin cluster config version` [command](./reference/ydb-cli/commands/configuration/cluster/index.md#list) to show the configuration version (V1/V2) on nodes.

Expand Down
11 changes: 11 additions & 0 deletions ydb/docs/en/core/recipes/ydb-cli/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,15 @@
| [TPC-DS](https://tpc.org/tpcds/) | [tpcds](../../reference/ydb-cli/workload-tpcds.md) |
| [ClickBench](https://benchmark.clickhouse.com/) | [clickbench](../../reference/ydb-cli/workload-click-bench.md) |

And similar user-defined benchmark `query`, see [reference](../../reference/ydb-cli/workload-query.md).

They all function similarly. For a detailed description of each, refer to the relevant reference via the links above. All commands for working with benchmarks are organized into corresponding groups, and the database path is specified in the same way for all commands:

```bash
{{ ydb-cli }} workload clickbench --path path/in/database ...
{{ ydb-cli }} workload tpch --path path/in/database ...
{{ ydb-cli }} workload tpcds --path path/in/database ...
{{ ydb-cli }} workload query --path path/in/database ...
```

Load testing can be divided into 3 stages:
Expand All @@ -34,6 +37,7 @@ Initialization is performed by the `init` command:
{{ ydb-cli }} workload clickbench --path clickbench/hits init --store=column
{{ ydb-cli }} workload tpch --path tpch/s1 init --store=column
{{ ydb-cli }} workload tpcds --path tpcds/s1 init --store=column
{{ ydb-cli }} workload query --path user/suite1 init --suite-path /home/user/user_suite
```

At this stage, you can configure the tables to be created:
Expand All @@ -49,6 +53,7 @@ For more details, see the description of the commands for each benchmark:
* [clickbench init](../../reference/ydb-cli/workload-click-bench.md#init)
* [tpch init](../../reference/ydb-cli/workload-tpch.md#init)
* [tpcds init](../../reference/ydb-cli/workload-tpcds.md#init)
* [query init](../../reference/ydb-cli/workload-query.md#init)

### Data filling

Expand All @@ -59,13 +64,15 @@ For a detailed description, see the relevant reference sections:
* [clickbench import](../../reference/ydb-cli/workload-click-bench.md#load)
* [tpch import](../../reference/ydb-cli/workload-tpch.md#load)
* [tpcds import](../../reference/ydb-cli/workload-tpcds.md#load)
* [query import](../../reference/ydb-cli/workload-query.md#load)

Examples:

```bash
{{ ydb-cli }} workload clickbench --path clickbench/hits import files --input hits.csv.gz
{{ ydb-cli }} workload tpch --path tpch/s1 import generator --scale 1
{{ ydb-cli }} workload tpcds --path tpcds/s1 import generator --scale 1
{{ ydb-cli }} workload query --path user/suite1 import --suite-path /home/user/user_suite
```

## Testing {#testing}
Expand All @@ -78,6 +85,7 @@ Examples:
{{ ydb-cli }} workload clickbench --path clickbench/hits run --include 1-5,8
{{ ydb-cli }} workload tpch --path tpch/s1 run --exсlude 3,4 --iterations 3
{{ ydb-cli }} workload tpcds --path tpcds/s1 run --plan ~/query_plan --include 2 --iterations 5
{{ ydb-cli }} workload query --path user/suite1 run --plan ~/query_plan --include first_query_set.1.sql,second_query_set.2.sql --iterations 5
```

The command allows you to select queries for execution, generate various types of reports, collect execution statistics, and more.
Expand All @@ -87,6 +95,7 @@ For a detailed description, see the relevant reference sections:
* [clickbench run](../../reference/ydb-cli/workload-click-bench.md#run)
* [tpch run](../../reference/ydb-cli/workload-tpch.md#run)
* [tpcds run](../../reference/ydb-cli/workload-tpcds.md#run)
* [query run](../../reference/ydb-cli/workload-query.md#run)

## Cleanup {#cleanup}

Expand All @@ -96,10 +105,12 @@ After all necessary testing has been completed, the benchmark's data can be remo
{{ ydb-cli }} workload clickbench --path clickbench/hits clean
{{ ydb-cli }} workload tpch --path tpch/s1 clean
{{ ydb-cli }} workload tpcds --path tpcds/s1 clean
{{ ydb-cli }} workload query --path user/suite1 clean
```

For a detailed description, see the corresponding sections:

* [clickbench clean](../../reference/ydb-cli/workload-click-bench.md#cleanup)
* [tpch clean](../../reference/ydb-cli/workload-tpch.md#cleanup)
* [tpcds clean](../../reference/ydb-cli/workload-tpcds.md#cleanup)
* [query clean](../../reference/ydb-cli/workload-query.md#cleanup)
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
| `--upload-threads <value>` or `-t <value>` | The number of execution threads for data preparation. | The number of available cores on the client. |
| `--bulk-size <value>` | The size of the chunk for sending data, in rows. | 10000 |
| `--max-in-flight <value>` | The maximum number of data chunks that can be processed simultaneously. | 128 |
| `--file-output-path <value>` or `-f <path>` | If this option set, data will not be load to db, but saved into <path> directory. | |
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@

| Name | Description | Default value |
|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|
| `--dry-run` | Do not actually execute the queries, just print them | |
| `--check-canonical` or `-c` | Use special deterministic internal queries and compare the results against canonical ones. | |
| `--output <value>` | The name of the file where the query execution results will be saved. | `results.out` |
| `--iterations <value>` | The number of times each load query will be executed. | `1` |
| `--json <name>` | The name of the file where query execution statistics will be saved in `json` format. | Not saved by default |
| `--ministat <name>` | The name of the file where query execution statistics will be saved in `ministat` format. | Not saved by default |
| `--csv <name>` | The name of the file where csv version of summary table will be saved. | Not saved by default |
| `--plan <name>` | The name of the file to save the query plan. Files like `<name>.<query number>.explain` and `<name>.<query number>.<iteration number>` will be saved in formats: `ast`, `json`, `svg`. | Not saved by default |
| `--query-prefix <setting>` | Query prefix. Every prefix is a line that will be added to the beginning of each query. For multiple prefix lines use this option several times. | Not specified by default |
`--retries` | Max retry count for every request. | `0`
| `--include` | Query numbers or segments to be executed as part of the load. | All queries executed |
| `--exclude` | Query numbers or segments to be excluded from the load. | None excluded by default |
| `--executer` | Query execution engine. Available values: `scan`, `generic`. | `generic` |
| `--retries` | Max retry count for every request. | `0`
| `--include` | Query names, numbers or segments to be executed as part of the load. | All queries executed |
| `--exclude` | Query names, numbers or segments to be excluded from the load. | None excluded by default |
| `--verbose` or `-v` | Print additional information to the screen during query execution. | |
| `--threads <value>` or `-t <value>` | The number of parallel threads generatibg the load |
| `--global-timeout <value>` | Global timeout for all requests. Use text format as `0.5s`, `1m`, `100us` etc. | |
| `--request-timeout <value>` | Timeout for each iteration of each request. Use text format as `0.5s`, `1m`, `100us` etc. | |
| `--threads <count>` or `-t <count>` | Number of parallel threads generatibg the load. Zero means that queries will be executed in main thread, in other case queries will be shuffled. | `0` |
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@ The following types of load tests are supported at the moment:
* [TPC-DS](../../../workload-tpcds.md): [TPC-DS benchmark](https://www.tpc.org/tpcds/).
* [Topic](../../../workload-topic.md): Topic load.
* [Transfer](../../../workload-transfer.md): Transfer load.
* [Query](../../../workload-query.md) - User-defined load.
2 changes: 2 additions & 0 deletions ydb/docs/en/core/reference/ydb-cli/toc_i.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ items:
href: workload-tpch.md
- name: TPC-DS load
href: workload-tpcds.md
- name: User-defined load
href: workload-query.md
- name: Managing configuration
href: configs.md
include:
Expand Down
16 changes: 7 additions & 9 deletions ydb/docs/en/core/reference/ydb-cli/workload-click-bench.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,9 @@ See the description of the command to init the data load:
| `--external-s3-endpoint <value>` or `-e <value>` | Only relevant for external tables. Link to S3 Bucket with data. | |
| `--string` | Use `String` type for text fields. `Utf8` is used by default. | |
| `--datetime` | Use `Date`, `Datetime` and `Timestamp` type for time-related fields. |`Date32`, `Datetime64` and `Timestamp64`|
| `--partition-size` | Maximum partition size in megabytes (AUTO_PARTITIONING_PARTITION_SIZE_MB) for row tables. | 2000 |
| `--clear` | If the table at the specified path has already been created, it will be deleted.| |
| `--partition-size` | Maximum partition size in megabytes (AUTO_PARTITIONING_PARTITION_SIZE_MB) for row tables. | 2000 |
| `--clear` | If the table at the specified path has already been created, it will be deleted. | |
| `--dry-run` | Do not actually perform DDL queries but only print them | |

## Loading data into a table { #load }

Expand All @@ -64,6 +65,7 @@ For source files, you can use CSV and TSV files, as well as directories containi
| `--input <path>` or `-i <path>` | Path to the source data files. Both unpacked and packed CSV and TSV files, as well as directories containing such files, are supported. Data can be downloaded from the official ClickBench website: [csv.gz](https://datasets.clickhouse.com/hits_compatible/hits.csv.gz), [tsv.gz](https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz). To speed up the process, these files can be split into smaller parts, allowing parallel downloads. | |
| `--state <path>` | Path to the download state file. If the download is interrupted, it will resume from the same point when restarted. | |
| `--clear-state` | Relevant if the `--state` parameter is specified. Clears the state file and restarts the download from the beginning. | |
| `--dry-run` | Do not actually perform import | |

{% include [load_options](./_includes/workload/load_options.md) %}

Expand All @@ -87,13 +89,9 @@ See the command description to run the load:

### ClickBench-specific options { #run_clickbench_options }

| Name | Description | Default value |
|---------------------------------------------|-------------------------------------------------------------------------------------------------------------|---------------|
| `--ext-queries <queries>` or `-q <queries>` | External queries to execute during the load, separated by semicolons. | |
| `--ext-queries-file <name>` | Name of the file containing external queries to execute during the load, separated by semicolons. | |
| `--ext-query-dir <name>` | Directory containing external queries for the load. Queries should be in files named `q[0-42].sql`. | |
| `--ext-results-dir <name>` | Directory containing external query results for comparison. Results should be in files named `q[0-42].sql`. | |
| `--check-canonical` or `-c` | Use special deterministic internal queries and compare the results against canonical ones. | |
| Name | Description | Default value |
|--------------------|----------------------------------------------------------|---------------|
| `--syntax <value>` | Which query syntax option should be used, `yql` or `pg`. | `yql` |

## Cleanup test data { #cleanup }

Expand Down
Loading