ddl: add tidb_skip_tiflash_replica_wait to bypass TiFlash replica wait on ADD PARTITION#67922
ddl: add tidb_skip_tiflash_replica_wait to bypass TiFlash replica wait on ADD PARTITION#67922premal wants to merge 2 commits intopingcap:masterfrom
Conversation
Introduces a new session variable that allows ADD PARTITION DDL to skip the StateReplicaOnly retry loop waiting for TiFlash replica readiness. When enabled, the DDL worker completes immediately; the background TiFlash ticker handles promoting the new partition to AvailablePartitionIDs once replication actually completes, preventing premature availability. This is useful for Lightning ETL workloads that write only to TiKV and do not require TiFlash readiness before the partition is made public. Fixes: pingcap#67919
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@premal I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
Hi @premal. Thanks for your PR. I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @premal. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
📝 WalkthroughWalkthroughA new session variable Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Assert variable is session-only (HasSessionScope=true, HasGlobalScope=false) - Add skip=OFF regression path: ADD PARTITION with default setting still enters StateReplicaOnly and the background ticker eventually promotes the new partition into AvailablePartitionIDs
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the For example:
📖 For more info, you can check the "Contribute Code" section in the development guide. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/sessionctx/vardef/tidb_vars.go (1)
697-700: Move this constant to the session-only sysvar block.
tidb_skip_tiflash_replica_waitis session-scoped, but this is currently placed under the block documented as “both in session and global scope.” Consider moving it to the session-only constants near the top of the file to keep scope grouping accurate.♻️ Suggested move
- // TiDBSkipTiFlashReplicaWait skips the StateReplicaOnly wait for TiFlash replica readiness - // during ADD PARTITION. Use when the caller writes only to TiKV and does not need TiFlash - // ready before the partition is made public. - TiDBSkipTiFlashReplicaWait = "tidb_skip_tiflash_replica_wait"Add the same constant/comment in the
// TiDB system variable names that only in session scope.const block.As per coding guidelines, “Follow existing package-local conventions first and keep style consistent with nearby files.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sessionctx/vardef/tidb_vars.go` around lines 697 - 700, The constant TiDBSkipTiFlashReplicaWait is defined in the wrong grouping; move the TiDBSkipTiFlashReplicaWait = "tidb_skip_tiflash_replica_wait" declaration (and its comment) out of the mixed-scope block and add it to the existing "// TiDB system variable names that only in session scope." const block near the top of the file so it lives with other session-only sysvars and the scope grouping remains accurate.pkg/sessionctx/variable/sysvar.go (1)
930-938: Move this session-only sysvar into the session-scope block.The declaration is
ScopeSession, but it’s currently placed in the later TiDB/global area. Keeping it with the other session-scope sysvars makes the registration easier to find and preserves this file’s documented ordering convention. As per coding guidelines, Follow existing package-local conventions first and keep style consistent with nearby files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sessionctx/variable/sysvar.go` around lines 930 - 938, The TiDBSkipTiFlashReplicaWait session-only sysvar declaration (vardef.TiDBSkipTiFlashReplicaWait with SetSession using SessionVars.SkipTiFlashReplicaWait, GetSession using BoolToOnOff, and TiDBOptOn) should be moved from the TiDB/global area into the existing session-scope block where other ScopeSession sysvars are registered; locate the session-only registrations in this file, cut this entire map entry and paste it alongside those entries to preserve file ordering and conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/ddl/tests/tiflash/ddl_tiflash_test.go`:
- Around line 1093-1129: The test races the background TiFlash ticker when
asserting p2 is not yet available; pause the TiFlash ticker via the package's
failpoint/fake-ticker control before the ADD PARTITION for p2, run the negative
assertion using tb.Meta().TiFlashReplica.IsPartitionAvailable(newPartID), then
resume the ticker and assert the partition becomes available; similarly after
adding p3 with skip disabled, wait/resume the ticker and assert the specific
partition ID for p3 is present in AvailablePartitionIDs (use the same
tb.Meta().TiFlashReplica and IsPartitionAvailable checks) rather than only
calling CheckTableAvailableWithTableName; ensure the failpoint/ticker is enabled
before the test and disabled/cleaned up after.
---
Nitpick comments:
In `@pkg/sessionctx/vardef/tidb_vars.go`:
- Around line 697-700: The constant TiDBSkipTiFlashReplicaWait is defined in the
wrong grouping; move the TiDBSkipTiFlashReplicaWait =
"tidb_skip_tiflash_replica_wait" declaration (and its comment) out of the
mixed-scope block and add it to the existing "// TiDB system variable names that
only in session scope." const block near the top of the file so it lives with
other session-only sysvars and the scope grouping remains accurate.
In `@pkg/sessionctx/variable/sysvar.go`:
- Around line 930-938: The TiDBSkipTiFlashReplicaWait session-only sysvar
declaration (vardef.TiDBSkipTiFlashReplicaWait with SetSession using
SessionVars.SkipTiFlashReplicaWait, GetSession using BoolToOnOff, and TiDBOptOn)
should be moved from the TiDB/global area into the existing session-scope block
where other ScopeSession sysvars are registered; locate the session-only
registrations in this file, cut this entire map entry and paste it alongside
those entries to preserve file ordering and conventions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fd0fabe5-316b-427d-ac9b-49b42f921ea4
📒 Files selected for processing (7)
pkg/ddl/executor.gopkg/ddl/partition.gopkg/ddl/tests/tiflash/ddl_tiflash_test.gopkg/sessionctx/vardef/tidb_vars.gopkg/sessionctx/variable/session.gopkg/sessionctx/variable/sysvar.gopkg/sessionctx/variable/sysvar_test.go
| // Enable skip-wait so the DDL worker bypasses the StateReplicaOnly retry loop. | ||
| tk.MustExec("SET SESSION tidb_skip_tiflash_replica_wait = ON") | ||
|
|
||
| // ADD PARTITION should complete immediately without waiting for TiFlash replication. | ||
| tk.MustExec("ALTER TABLE ddltiflash_skip ADD PARTITION (PARTITION p2 VALUES LESS THAN (30))") | ||
|
|
||
| // Verify the new partition is now public (DDL succeeded). | ||
| tb, err := s.dom.InfoSchema().TableByName(context.Background(), model.NewCIStr("test"), model.NewCIStr("ddltiflash_skip")) | ||
| require.NoError(t, err) | ||
| pi := tb.Meta().GetPartitionInfo() | ||
| require.NotNil(t, pi) | ||
| require.Equal(t, 0, len(pi.AddingDefinitions), "AddingDefinitions should be empty after DDL completes") | ||
|
|
||
| // Find the new partition ID. | ||
| var newPartID int64 | ||
| for _, def := range pi.Definitions { | ||
| if def.Name.L == "p2" { | ||
| newPartID = def.ID | ||
| break | ||
| } | ||
| } | ||
| require.NotZero(t, newPartID, "partition p2 should exist in Definitions") | ||
|
|
||
| // The key correctness assertion: with skipWait=true, the DDL worker must NOT | ||
| // have prematurely added the new partition to AvailablePartitionIDs. | ||
| // The background TiFlash ticker is responsible for that once replication completes. | ||
| require.NotNil(t, tb.Meta().TiFlashReplica) | ||
| require.False(t, tb.Meta().TiFlashReplica.IsPartitionAvailable(newPartID), | ||
| "new partition should NOT be in AvailablePartitionIDs immediately after ADD PARTITION with skipWait=true") | ||
|
|
||
| // Regression: with skip=OFF (default), ADD PARTITION still enters StateReplicaOnly | ||
| // and the background ticker eventually adds the partition to AvailablePartitionIDs. | ||
| tk.MustExec("SET SESSION tidb_skip_tiflash_replica_wait = OFF") | ||
| tk.MustExec("ALTER TABLE ddltiflash_skip ADD PARTITION (PARTITION p3 VALUES LESS THAN (40))") | ||
|
|
||
| time.Sleep(ddl.PollTiFlashInterval * RoundToBeAvailablePartitionTable) | ||
| CheckTableAvailableWithTableName(s.dom, t, 1, []string{}, "test", "ddltiflash_skip") |
There was a problem hiding this comment.
Make the TiFlash availability assertions deterministic.
Line 1120 races the background TiFlash ticker: it can add p2 before the “immediately after ADD PARTITION” assertion runs, causing a false failure. Also, Lines 1128-1129 don’t verify that the default-OFF path added p3 to AvailablePartitionIDs; they only check table-level availability. Pause the ticker around the skip-wait negative assertion, then assert the specific partition ID after re-enabling it.
🧪 Proposed test hardening
// Enable skip-wait so the DDL worker bypasses the StateReplicaOnly retry loop.
tk.MustExec("SET SESSION tidb_skip_tiflash_replica_wait = ON")
+ // Stop the background ticker so the negative assertion below only verifies
+ // the DDL worker behavior, not a concurrent ticker update.
+ tickerPaused := true
+ require.NoError(t, failpoint.Enable("github.com/pingcap/tidb/pkg/ddl/BeforeRefreshTiFlashTickerLoop", `return`))
+ defer func() {
+ if tickerPaused {
+ require.NoError(t, failpoint.Disable("github.com/pingcap/tidb/pkg/ddl/BeforeRefreshTiFlashTickerLoop"))
+ }
+ }()
+
// ADD PARTITION should complete immediately without waiting for TiFlash replication.
tk.MustExec("ALTER TABLE ddltiflash_skip ADD PARTITION (PARTITION p2 VALUES LESS THAN (30))")
// Verify the new partition is now public (DDL succeeded).
@@
- // Find the new partition ID.
- var newPartID int64
- for _, def := range pi.Definitions {
- if def.Name.L == "p2" {
- newPartID = def.ID
- break
- }
- }
- require.NotZero(t, newPartID, "partition p2 should exist in Definitions")
+ findPartitionID := func(pi *model.PartitionInfo, name string) int64 {
+ for _, def := range pi.Definitions {
+ if def.Name.L == name {
+ return def.ID
+ }
+ }
+ return 0
+ }
+ newPartID := findPartitionID(pi, "p2")
+ require.NotZero(t, newPartID, "partition p2 should exist in Definitions")
@@
require.False(t, tb.Meta().TiFlashReplica.IsPartitionAvailable(newPartID),
"new partition should NOT be in AvailablePartitionIDs immediately after ADD PARTITION with skipWait=true")
+ require.NoError(t, failpoint.Disable("github.com/pingcap/tidb/pkg/ddl/BeforeRefreshTiFlashTickerLoop"))
+ tickerPaused = false
+
// Regression: with skip=OFF (default), ADD PARTITION still enters StateReplicaOnly
// and the background ticker eventually adds the partition to AvailablePartitionIDs.
tk.MustExec("SET SESSION tidb_skip_tiflash_replica_wait = OFF")
tk.MustExec("ALTER TABLE ddltiflash_skip ADD PARTITION (PARTITION p3 VALUES LESS THAN (40))")
time.Sleep(ddl.PollTiFlashInterval * RoundToBeAvailablePartitionTable)
+ tb, err = s.dom.InfoSchema().TableByName(context.Background(), model.NewCIStr("test"), model.NewCIStr("ddltiflash_skip"))
+ require.NoError(t, err)
+ pi = tb.Meta().GetPartitionInfo()
+ require.NotNil(t, pi)
+ p3ID := findPartitionID(pi, "p3")
+ require.NotZero(t, p3ID, "partition p3 should exist in Definitions")
+ require.True(t, tb.Meta().TiFlashReplica.IsPartitionAvailable(p3ID),
+ "new partition should be in AvailablePartitionIDs after ADD PARTITION with skipWait=false")
CheckTableAvailableWithTableName(s.dom, t, 1, []string{}, "test", "ddltiflash_skip")As per coding guidelines, keep test changes minimal and deterministic; unit tests in a package that uses failpoints must enable failpoints before tests and disable afterward.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/ddl/tests/tiflash/ddl_tiflash_test.go` around lines 1093 - 1129, The test
races the background TiFlash ticker when asserting p2 is not yet available;
pause the TiFlash ticker via the package's failpoint/fake-ticker control before
the ADD PARTITION for p2, run the negative assertion using
tb.Meta().TiFlashReplica.IsPartitionAvailable(newPartID), then resume the ticker
and assert the partition becomes available; similarly after adding p3 with skip
disabled, wait/resume the ticker and assert the specific partition ID for p3 is
present in AvailablePartitionIDs (use the same tb.Meta().TiFlashReplica and
IsPartitionAvailable checks) rather than only calling
CheckTableAvailableWithTableName; ensure the failpoint/ticker is enabled before
the test and disabled/cleaned up after.
What problem does this PR solve?
Issue Number: close #67919
When a partitioned table has a TiFlash replica, every
ADD PARTITIONDDL entersStateReplicaOnlyand polls until TiFlash replication is confirmed complete (retrying every ~2.5 s per partition). For workloads that write exclusively through TiKV (e.g. TiDB Lightning) this wait is unnecessary: the data is already durable in TiKV and the caller does not need TiFlash availability before the partition is usable.At high partition counts the cost is significant. Benchmark (20 parallel
ADD PARTITIONcalls, TiFlash replica present, data loaded via Lightning):tidb_skip_tiflash_replica_wait = OFF(default)tidb_skip_tiflash_replica_wait = ON~67% reduction in mean per-partition time.
What is changed and how it works?
New session variable:
tidb_skip_tiflash_replica_waitWhen
ON, the DDL worker skips theStateReplicaOnlyreadiness wait inonAddTablePartition. The partition transitions directly toStatePublic.AvailablePartitionIDscorrectness: The append of the new partition ID intoAvailablePartitionIDs(used by the TiFlash background ticker to track which partitions have completed replication) is gated on!skipWait. WhenskipWait=truethe background ticker handles the update once replication actually completes — preventing the partition from being reported as TiFlash-available prematurely.Files changed
pkg/sessionctx/vardef/tidb_vars.go—TiDBSkipTiFlashReplicaWaitconstantpkg/sessionctx/variable/sysvar.go— register the session variable withGetSession/SetSessionhookspkg/sessionctx/variable/session.go—SkipTiFlashReplicaWait boolfield onSessionVarspkg/ddl/executor.go— packSkipTiFlashReplicaWaitinto the DDL job viaAddSystemVarspkg/ddl/partition.go— read the flag inonAddTablePartition; gate both theStateReplicaOnlyretry and theAvailablePartitionIDsappend on!skipWaitSide effects checklist
tidb_skip_tiflash_replica_wait(default OFF, fully backward-compatible)AvailablePartitionIDsnote: whenskipWait=ONthe new partition ID is intentionally not appended by the DDL worker; the background TiFlash ticker adds it after replication completesTests
Test details
pkg/sessionctx/variable/sysvar_test.go—TestTiDBSkipTiFlashReplicaWaitVerifies the session variable default (OFF), setter, and getter round-trip.
pkg/ddl/tests/tiflash/ddl_tiflash_test.go—TestAddPartitionSkipTiFlashReplicaWaitSets
tidb_skip_tiflash_replica_wait=ON, runsADD PARTITION, confirms:StateReplicaOnlyretry loopAvailablePartitionIDsimmediately after DDL (i.e. premature availability is prevented)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
tidb_skip_tiflash_replica_waitsession variable to allow skipping TiFlash replica replication wait during partition addition operations.Tests