-
Notifications
You must be signed in to change notification settings - Fork 472
[copy_from]: Flush out implementation of CSV format #30956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fcdea0f
to
b7faadd
Compare
* plumb the CSV params through to oneshot_source::CsvFormat to handle different delimiters and what not * add support for compressed CSVs using async_compression * various Bazel changes for the C dependencies from the compression algorithms
b7faadd
to
7f66bbe
Compare
jkosh44
reviewed
Jan 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adapter bits LGTM, I'll leave sign off to storage where the bulk of the changes are.
aljoscha
approved these changes
Jan 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
ParkMyCar
added a commit
that referenced
this pull request
Jan 29, 2025
_Stacked on top of_: #30956 This PR implements a new AwsS3 `OneshotSource` that allows copying in files from S3, e.g. ``` COPY INTO my_table FROM 's3://my-test-bucket' (FORMAT CSV, FILES = ['important.csv']); ``` Along with `FILES = [<files>]` we also support a `PATTERN = <glob>` option which allows copying multiple files all at once. ### Motivation Fixes https://github.com/MaterializeInc/database-issues/issues/8860 Fixes https://github.com/MaterializeInc/database-issues/issues/8855 ### Tips for reviewer Review only the final commit, the one titled "start, implementation of an S3 oneshot source" ### Checklist - [x] This PR has adequate test coverage / QA involvement has been duly considered. ([trigger-ci for additional test/nightly runs](https://trigger-ci.dev.materialize.com/)) - [x] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design. <!-- Reference the design in the description. --> - [x] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [x] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](https://github.com/MaterializeInc/cloud/pull/5021)). <!-- Ask in #team-cloud on Slack if you need help preparing the cloud PR. --> - [x] If this PR includes major [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note), I have pinged the relevant PM to schedule a changelog post.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes the CSV implementation for bulk imports "feature complete". It adds support for specifying things like the delimiter and escape character, as well as support for handling compressed CSV files.
Motivation
Fixes https://github.com/MaterializeInc/database-issues/issues/8902
Tips for reviewer
While the changes exist in
storage-*
crates, they are more general async-Rust changes and nothing necessarily specific to storage itself.Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.