Skip to content

[YDB CLI][CSV] Optimization: Use Apache Arrow and Protobuf Arena API as fallback in CSV upload module #19667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

Vladislav0Art
Copy link

@Vladislav0Art Vladislav0Art commented Jun 14, 2025

Changelog entry

  • Optimized upload speed for CSV uploading in the YDB CLI module.

Changelog category

  • Performance improvement

Description for reviewers

Changed: YDB CLI, CSV uploading module.

Description:
Optimized the upload speed of the CSV uploading module in YBD CLI by migrating the implementation from use of TValue to sending data in the Apache Arrow format, with a fallback on TValue solution whose protobuf models are allocated on the Protobuf Arena.

The fallback is a "one arena per one request" strategy, i.e. a protobuf model of every request gets allocated on its own Protobuf arena.

Upload speed improvements:
The initial upload speed is about: 134.84 MiB/s (measured on commit 33732bf79ec)

The improvements:

  1. When Apache Arrow is used: 356.31 MiB/s (+164%)
  2. When fallback with Protobuf Arenas is used: 249.12 MiB/s (+85%)

Methodology used for measurements:

  1. The dataset: eCommerce (the same one as in YDB CLI. Improve ydb import file csv throughput #11678).
  2. Number of the files being uploaded: 1
  3. Results are a median among runs: 10
  4. Used VM: CPU 36, Intel(R) Xeon(R) Platinum 8124M 3.00GHz, 68 GB RAM.

Notes:

  1. This optimization to TLinesSplitter::ConsumeLine method yields further improvement of the CSV upload speed of 476.49 MiB/s (+253%).
  2. There is a pipeline that does performance measurements and plots building for the CSV uploading module (requires special printing in import.cpp here).
  3. Currently, a report of the research work done is being prepared where one may read about the details and decisions made on the chosen optimizations.

Relates to: #11678
Responsible: @pnv1, @asmyasnikov

Because `TValue&& rows` is provided as rvalue reference, we can
move its value into the request rather than making a copy of the data.

Speed boost: ~130MiB/s -> ~168-200MiB/s.
…sertUnretryable (leaving BulkUpsert unchanged)
Implement additional APIs that work with arena-allocated proto messages.
The implementation creates a new protobuf::Arena per batch request and builds Ydb::Value
inside this arena inside TCsvParser::BuildListOnArena.

Ydb::Value gets created once and then moved into Ydb::Table::BulkUpsertRequest, which
is allocated on the same arena (allocation in the same arena prevents copying).

Left undone/requires modifications:
1. TODO: I had to copy-paste RunDeferred/Run methods (see: RunDeferredOnArena/RunOnArena)
         because arena-allocated messages are returned as pointers, but the API expected
         rvalue/lvalue references on the request message. If the pointer is deferefenced,
         there will be a single copy of the message inside TGRpcConnectionsImpl::Run when
         WithServiceConnection is called and request is moved into capture parameter of
         a callback. In the best case, there must be no code duplication (the copy-pasted
         methods differ only in the type of accepted request: they accept TRequest*).

2. TODO: ensure correctness.

Speed boost: ~130MiB/s -> ~250MiB/s
Notes:
1. Creates custom converter in `csv_parser.cpp` to build `Ydb::Value` on the arena.
2. Creates new implementer of `TValueBuilderBase` to build `TArenaAllocatedValue` in `value.cpp`/`value.h`.
3. Introduces `TValueHolder` interface with two implementations:
   - `StackAllocatedValueHolder` with stack-allocated `Ydb::Value`.
   - `TArenaAllocatedValueHolder` with arena-allocated `Ydb::Value`.

Current drawbacks:
- `TValueHolder` has virtual calls, which is costy; try to employ static polymorphism.
- `FieldToValueOnArena` creates `TCsvToYdbOnArenaConverter`, which is a copy of `TCsvToYdbConverter`.

Verdict: didn't produce any performance improvements (likely due to virtual calls in `TValueHolder`).
…ildListOnArena`

This commit fixes the issue with virtual calls in `TValueHolder`,
integrated in the commit 396d98f.

Average speed: 254-257 MiB/s (33.1-33.6 sec spent)

There seems to be no performance difference in terms of upload speed.
The only difference is that the percental fraction of CPU load
that `TCsvParser::BuildListOnArena` method takes:
1. this commit: 42.7% of the total CPU time.
2. commit 396d98f: 43.1% of the total CPU time.
3. without both commits: 49.3% of the total CPU time.
…hars counting

Benchmarking results (see commit 6201814 for details):

1. Previous solution (at commit 6201814):
Elapsed: 17.7 sec. Total read size: 8.33GiB. Average processing speed: 482MiB/s.

2. Current solution:
Elapsed: 10.6 sec. Total read size: 8.33GiB. Average processing speed: 804MiB/s (up to 817MiB/s).

Cherry-picked the commit: `773a3a5`
See: 773a3a5
Requires testing via a real instance of YDBD to ensure that the refactoring is correct.
Copy link

Hi! Thank you for contributing!
The tests on this PR will run after a maintainer adds an ok-to-test label to this PR manually. Thank you for your patience!

Copy link

github-actions bot commented Jun 14, 2025

🟢 2025-06-15 15:14:54 UTC The validation of the Pull Request description is successful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant