Skip to content

Conversation

Mrart
Copy link
Contributor

@Mrart Mrart commented Aug 21, 2025

@github-actions github-actions bot added docs Improvements or additions to documentation mysql-cdc-connector mysql-pipeline-connector labels Aug 21, 2025
xiayuxiao and others added 2 commits August 22, 2025 08:49
… with conditions

[FLINK-36165][source-connector/mysql] add docs

[FLINK-36165][source-connector/mysql] Implement snapshot filter for MySQL table source

[FLINK-36165][source-connector/mysql] Escape dot

[FLINK-36165 ] fixed supported escape like 'city != 'China:beijing''
@Mrart Mrart force-pushed the FLINK-36165 branch 3 times, most recently from d0738af to c3005b6 Compare August 23, 2025 02:50
fixed checkstyle

fixed test

fixed MySqlTableSourceFactoryTest test error.
Comment on lines +296 to +306
<tr>
<td>scan.snapshot.filters</td>
<td>optional</td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>When reading a table snapshot, the rows of captured tables will be filtered using the specified filter expression (AKA a SQL WHERE clause). <br>
By default, no filter is applied, meaning the entire table will be synchronized. <br>
A colon (:) separates table name and filter expression, while a semicolon (;) separate multiple filters,
e.g. `db1.user_table_[0-9]+:id > 100;db[1-2].[app|web]_order_\\.*:id < 0;`.
</td>
</tr>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about explicitly stating that the filter conditions are combined using AND and that it has nothing to do with the binlog step?

Comment on lines +58 to +67
Map<Selectors, String> snapshotFilters = toSelector(filters);

String filter = null;
for (Selectors selector : snapshotFilters.keySet()) {
if (selector.isMatch(
org.apache.flink.cdc.common.event.TableId.tableId(
tableId.catalog(), tableId.table()))) {
filter = snapshotFilters.get(selector);
break;
}
Copy link
Contributor

@SML0127 SML0127 Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fail fast on unknown/nonexiststent columns in filters?

Comment on lines +64 to +88
public static Long queryRowCnt(
JdbcConnection jdbc, TableId tableId, String columnName, @Nullable String filter)
throws SQLException {

if (filter == null) {
return queryApproximateRowCnt(jdbc, tableId);
}

final String cntQuery =
String.format(
"SELECT COUNT(%s) FROM %s WHERE %s",
quote(columnName), quote(tableId), filter);
return jdbc.queryAndMap(
cntQuery,
rs -> {
if (!rs.next()) {
// this should never happen
throw new SQLException(
String.format(
"No result returned after running query [%s]", cntQuery));
}
return rs.getLong(1);
});
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the filter is applied during split planning, the distribution factor may fall outside the configured bounds(0.05 ~1,000), affecting snapshot performance. I’d appreciate your thoughts on this.

Copy link
Contributor

@SML0127 SML0127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I've left a few minor comments.

@yuxiqian
Copy link
Member

yuxiqian commented Sep 8, 2025

It seems the author of #3776 is actively working on their PR, and there's some duplicated work... It would be nice if we can discuss & implement this nice feature at one place.

@Mrart Mrart closed this Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation mysql-cdc-connector mysql-pipeline-connector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants