-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mysql: add support for invisible columns #31239
base: main
Are you sure you want to change the base?
Conversation
All contributors have signed the CLA. |
I have read the Contributor License Agreement (CLA) and I hereby sign the CLA. |
f1ec013
to
bb7369e
Compare
.iter() | ||
.map(|col_info| format!("`{}`", col_info.name)) | ||
}) | ||
.join(", "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think this might be doing a different thing than what we need. Here you iterate over all outputs and concatenate the name of the columns from each one of them. So if I understand correctly, if the same table is ingested twice in Materialize we will end up with a query like SELECT col1, col2, col1, col2
and transfer twice the amount of data. Am I reading this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent point - I hadn't considered that case.
Now that I'm looking back at it, this may also run into time-of-check/time-of-use issues. There's no guarantee that the table definition is the same at the time we issue the select. Which means we need to identify the columns in the transaction that we run the query.. I'll make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no guarantee that the table definition is the same at the time we issue the select.
That's true, but also fine, because Materialize expects a certain schema that has already been committed in the catalog. So we should explicitly request the columns that we expect to be there, and if they aren't we'll get an error from the MySQL client and report it.
Even if we learned the new schema in the dataflow there isn't anything else we can do at that point. If the user has changed the upstream schema after having created the ingestion in Materialize then the only way out is to drop and recreate the tables in Materialize.
I think what we want to do here is collect the set of columns that each output needs (using something like a BTreeSet
) and then make sure that we decode and project the columns in the right order downstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 When we build MySqlTableDesc.columns, the intention appears to be that all columns will be in the struct. For ignored columns, column_type
is set to None
, but the name is maintained. So I expect it's possible to use any entry from outputs
for the column names in the table description. I'll check if we have test coverage for it, if not, I'll add something.
@@ -99,7 +99,9 @@ def workflow_cdc(c: Composition, parser: WorkflowArgumentParser) -> None: | |||
|
|||
matching_files = [] | |||
for filter in args.filter: | |||
matching_files.extend(glob.glob(filter, root_dir="test/mysql-cdc")) | |||
matching_files.extend( | |||
glob.glob(filter, root_dir=MZ_ROOT / "test" / "mysql-cdc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix! I've been bitten by it!
To support invisible columns in MySQL, explicitly request columns in snapshot select statement. And while I'm here, I corrected the glob root path in mzcompose.py to get the tests to run.
Motivation
Adds support for invisible columns in mysql
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.