generated from duckdb/extension-template
-
Notifications
You must be signed in to change notification settings - Fork 131
Ordered compaction and inlining with full catalog integration #642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Alex-Monahan
wants to merge
56
commits into
duckdb:main
Choose a base branch
from
Alex-Monahan:ordered-compaction-catalog
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
56 commits
Select commit
Hold shift + click to select a range
eb59a29
Working hardcoded compaction sort!
Alex-Monahan f2b5947
parse order by string and manually bind.
Alex-Monahan 856e479
approx_order_by param in merge_adjacent_files
Alex-Monahan 3b52466
config option for approx_order_by
Alex-Monahan 3b24a22
Refactor into a method on DuckLakeCompactor
Alex-Monahan d670375
Move compactor to .hpp, GetApproxOrderBy fn, WIP inlined order by
Alex-Monahan b214ccd
working tests for ducklake_flush_inlined_data!
Alex-Monahan 295e7f5
Use const references. More commented out attempt to dynamically bind
Alex-Monahan e67d5e2
rename parameter to local_order_by
Alex-Monahan 87964df
Rename fns, - comments/prints,+ negative test
Alex-Monahan cf11b9b
Merge remote-tracking branch 'origin' into ordered-compaction
Alex-Monahan d12e237
Accept SET SORTED BY syntax (do nothing yet)
Alex-Monahan 62beb45
SET SORTED inserts to DB! Lots of wiring...
Alex-Monahan 41da46b
Working compact with sort_data instead of option! WIP cleanup.
Alex-Monahan cf6c617
Update tests to use new syntax. Passing!
Alex-Monahan 6bc926b
Inlining tests using new syntax working!
Alex-Monahan d9e4e61
Remove local_order_by option and naming
Alex-Monahan f8b4425
inline flush sorted within txn works. Not compaction (seems ok?). Tes…
Alex-Monahan 1449bdb
Revert rename of duckdb property
Alex-Monahan f55888b
RESET SORTED BY works and is tested!
Alex-Monahan 10d527d
If sorts match, don't insert to catalog
Alex-Monahan f89e12a
Merge branch 'main' into ordered-compaction-catalog
Alex-Monahan e3bf7a4
Remove duplicate InsertSort fn
Alex-Monahan 6b612cb
batch_queries for sort catalog operations
Alex-Monahan a0a2424
Remove comment block
Alex-Monahan 4c13a28
retarget to duckdb main
Alex-Monahan 61cc765
try to point duckdb submodule to correct commit
Alex-Monahan c988bed
extension-ci-tools git commit hash fix
Alex-Monahan b73fc7d
Undo old ducklake_option edits. Edit comments
Alex-Monahan a1a6b26
Add FIXME for expressions in sort. (And re-run CI/CD)
Alex-Monahan b50cf05
Merge branch 'main' into ordered-compaction-catalog
Alex-Monahan f167222
make format fixes
Alex-Monahan c1825f1
rename catalog table to ducklake_sort_info
Alex-Monahan 20e2c11
Merge branch 'main' into ordered-compaction-catalog
Alex-Monahan d864904
Fix merge by adding old sort table back
Alex-Monahan 3286d82
add ducklake_sort_expression table to spec. Update tests.
Alex-Monahan cbb5318
Error on SET SORTED BY if wrong columns or non-columns
Alex-Monahan 3152e2e
col names from latest_table. Test post rename
Alex-Monahan 7ac3077
PR feedback, case ins. equals, split to fns
Alex-Monahan 7434510
Split existing merge adjacent tests
Alex-Monahan 9f4e352
cleanup test comments
Alex-Monahan ebcfc4a
Split existing inline flush tests
Alex-Monahan fb5dd0b
test that SET SORTED BY applies by table_id not by name
Alex-Monahan 918ebf3
test that sort catalog tables cleared out by expire_snapshots
Alex-Monahan 27cfa9a
Don't update schema_version on SET SORTED BY. Track updates without s…
Alex-Monahan 7ae9384
format fixes
Alex-Monahan bd8c991
Comments on tables, table columns, and views do not increase schema_v…
Alex-Monahan 58f23df
format fix
Alex-Monahan 79064e6
add test for snapshots() changes
Alex-Monahan f17d7f2
Fix error message
Alex-Monahan dba148b
Fix error message in test
Alex-Monahan fcca281
Fix another error message
Alex-Monahan 062d00b
Tests: Rollbacks, sort & rename cols in same txn
Alex-Monahan cc4488f
format fixes
Alex-Monahan f24ddc2
Relax assertion: the table_entry_map contains tables and views.
Alex-Monahan c55a083
std::string to string
Alex-Monahan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| //===----------------------------------------------------------------------===// | ||
| // DuckDB | ||
| // | ||
| // functions/ducklake_compaction_functions.hpp | ||
| // | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| #pragma once | ||
|
|
||
| #include "functions/ducklake_table_functions.hpp" | ||
| #include "storage/ducklake_transaction.hpp" | ||
| #include "storage/ducklake_catalog.hpp" | ||
| #include "storage/ducklake_schema_entry.hpp" | ||
| #include "storage/ducklake_table_entry.hpp" | ||
| #include "storage/ducklake_insert.hpp" | ||
| #include "storage/ducklake_multi_file_reader.hpp" | ||
| #include "duckdb/planner/operator/logical_get.hpp" | ||
| #include "duckdb/planner/operator/logical_copy_to_file.hpp" | ||
| #include "duckdb/planner/operator/logical_extension_operator.hpp" | ||
| #include "duckdb/planner/operator/logical_set_operation.hpp" | ||
| #include "storage/ducklake_compaction.hpp" | ||
| #include "duckdb/common/multi_file/multi_file_function.hpp" | ||
| #include "storage/ducklake_multi_file_list.hpp" | ||
| #include "duckdb/planner/tableref/bound_at_clause.hpp" | ||
| #include "duckdb/planner/operator/logical_empty_result.hpp" | ||
|
|
||
| namespace duckdb { | ||
| //===--------------------------------------------------------------------===// | ||
| // Logical Operator | ||
| //===--------------------------------------------------------------------===// | ||
| class DuckLakeLogicalCompaction : public LogicalExtensionOperator { | ||
| public: | ||
| DuckLakeLogicalCompaction(idx_t table_index, DuckLakeTableEntry &table, | ||
| vector<DuckLakeCompactionFileEntry> source_files_p, string encryption_key_p, | ||
| optional_idx partition_id, vector<string> partition_values_p, optional_idx row_id_start, | ||
| CompactionType type) | ||
| : table_index(table_index), table(table), source_files(std::move(source_files_p)), | ||
| encryption_key(std::move(encryption_key_p)), partition_id(partition_id), | ||
| partition_values(std::move(partition_values_p)), row_id_start(row_id_start), type(type) { | ||
| } | ||
|
|
||
| idx_t table_index; | ||
| DuckLakeTableEntry &table; | ||
| vector<DuckLakeCompactionFileEntry> source_files; | ||
| string encryption_key; | ||
| optional_idx partition_id; | ||
| vector<string> partition_values; | ||
| optional_idx row_id_start; | ||
| CompactionType type; | ||
|
|
||
| public: | ||
| PhysicalOperator &CreatePlan(ClientContext &context, PhysicalPlanGenerator &planner) override { | ||
| auto &child = planner.CreatePlan(*children[0]); | ||
| return planner.Make<DuckLakeCompaction>(types, table, std::move(source_files), std::move(encryption_key), | ||
| partition_id, std::move(partition_values), row_id_start, child, type); | ||
| } | ||
|
|
||
| string GetExtensionName() const override { | ||
| return "ducklake"; | ||
| } | ||
| vector<ColumnBinding> GetColumnBindings() override { | ||
| vector<ColumnBinding> result; | ||
| result.emplace_back(table_index, 0); | ||
| return result; | ||
| } | ||
|
|
||
| void ResolveTypes() override { | ||
| types = {LogicalType::BOOLEAN}; | ||
| } | ||
| }; | ||
|
|
||
| //===--------------------------------------------------------------------===// | ||
| // Compaction Command Generator | ||
| //===--------------------------------------------------------------------===// | ||
| class DuckLakeCompactor { | ||
| public: | ||
| DuckLakeCompactor(ClientContext &context, DuckLakeCatalog &catalog, DuckLakeTransaction &transaction, | ||
| Binder &binder, TableIndex table_id, DuckLakeMergeAdjacentOptions options); | ||
| DuckLakeCompactor(ClientContext &context, DuckLakeCatalog &catalog, DuckLakeTransaction &transaction, | ||
| Binder &binder, TableIndex table_id, double delete_threshold); | ||
| void GenerateCompactions(DuckLakeTableEntry &table, vector<unique_ptr<LogicalOperator>> &compactions); | ||
| unique_ptr<LogicalOperator> GenerateCompactionCommand(vector<DuckLakeCompactionFileEntry> source_files); | ||
| static unique_ptr<LogicalOperator> InsertSort(Binder &binder, unique_ptr<LogicalOperator> &plan, | ||
| DuckLakeTableEntry &table, optional_ptr<DuckLakeSort> sort_data); | ||
|
|
||
| private: | ||
| ClientContext &context; | ||
| DuckLakeCatalog &catalog; | ||
| DuckLakeTransaction &transaction; | ||
| Binder &binder; | ||
| TableIndex table_id; | ||
| double delete_threshold = 0.95; | ||
| DuckLakeMergeAdjacentOptions options; | ||
|
|
||
| CompactionType type; | ||
| }; | ||
|
|
||
| } // namespace duckdb |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should error if it is an unsupported dialect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I will look at the macros as an example. Would you prefer that I filter them out in SQL when I pull from the catalog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a question, I'm open to suggestions. On one hand, we can be more permissive with anything dialect related, but I fear this might cause confusion, while an error removes any kind of ambiguity of the system's state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back at this, the reason I implemented it this way was in case we wanted to deliberately have 2 sort orders active at once: one for DuckDB, and one for another engine like Spark for example. So if there were 2 separate sets of sorts, skipping the non-DuckDB ones (but not throwing an error) should allow them to co-exist.
Unfortunately, this case is very difficult to test in this repo! I think as catalog editing gets added on the Spark side, we can do some testing in that repo.
So, do you mind if we leave it as-is?