Skip to content

ddl: add tidb_skip_tiflash_replica_wait to bypass TiFlash replica wait on ADD PARTITION#67922

Open
premal wants to merge 2 commits intopingcap:masterfrom
premal:feature/skip-tiflash-wait-v2
Open

ddl: add tidb_skip_tiflash_replica_wait to bypass TiFlash replica wait on ADD PARTITION#67922
premal wants to merge 2 commits intopingcap:masterfrom
premal:feature/skip-tiflash-wait-v2

Conversation

@premal
Copy link
Copy Markdown

@premal premal commented Apr 20, 2026

What problem does this PR solve?

Issue Number: close #67919

When a partitioned table has a TiFlash replica, every ADD PARTITION DDL enters StateReplicaOnly and polls until TiFlash replication is confirmed complete (retrying every ~2.5 s per partition). For workloads that write exclusively through TiKV (e.g. TiDB Lightning) this wait is unnecessary: the data is already durable in TiKV and the caller does not need TiFlash availability before the partition is usable.

At high partition counts the cost is significant. Benchmark (20 parallel ADD PARTITION calls, TiFlash replica present, data loaded via Lightning):

Wall time Mean per-partition
tidb_skip_tiflash_replica_wait = OFF (default) ~49 s ~2.45 s
tidb_skip_tiflash_replica_wait = ON ~22 s ~1.1 s

~67% reduction in mean per-partition time.

What is changed and how it works?

New session variable: tidb_skip_tiflash_replica_wait

Property Value
Scope SESSION
Type bool
Default OFF

When ON, the DDL worker skips the StateReplicaOnly readiness wait in onAddTablePartition. The partition transitions directly to StatePublic.

AvailablePartitionIDs correctness: The append of the new partition ID into AvailablePartitionIDs (used by the TiFlash background ticker to track which partitions have completed replication) is gated on !skipWait. When skipWait=true the background ticker handles the update once replication actually completes — preventing the partition from being reported as TiFlash-available prematurely.

Files changed

  • pkg/sessionctx/vardef/tidb_vars.goTiDBSkipTiFlashReplicaWait constant
  • pkg/sessionctx/variable/sysvar.go — register the session variable with GetSession/SetSession hooks
  • pkg/sessionctx/variable/session.goSkipTiFlashReplicaWait bool field on SessionVars
  • pkg/ddl/executor.go — pack SkipTiFlashReplicaWait into the DDL job via AddSystemVars
  • pkg/ddl/partition.go — read the flag in onAddTablePartition; gate both the StateReplicaOnly retry and the AvailablePartitionIDs append on !skipWait

Side effects checklist

  • Contains variable changes — new session variable tidb_skip_tiflash_replica_wait (default OFF, fully backward-compatible)
  • Contains syntax changes
  • Contains behavior changes to existing features
  • AvailablePartitionIDs note: when skipWait=ON the new partition ID is intentionally not appended by the DDL worker; the background TiFlash ticker adds it after replication completes

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Test details

pkg/sessionctx/variable/sysvar_test.goTestTiDBSkipTiFlashReplicaWait
Verifies the session variable default (OFF), setter, and getter round-trip.

pkg/ddl/tests/tiflash/ddl_tiflash_test.goTestAddPartitionSkipTiFlashReplicaWait
Sets tidb_skip_tiflash_replica_wait=ON, runs ADD PARTITION, confirms:

  1. DDL completes without entering the StateReplicaOnly retry loop
  2. The new partition ID is not present in AvailablePartitionIDs immediately after DDL (i.e. premature availability is prevented)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added tidb_skip_tiflash_replica_wait session variable to allow skipping TiFlash replica replication wait during partition addition operations.
  • Tests

    • Added tests to verify the new session variable functionality and behavior during partition operations.

Introduces a new session variable that allows ADD PARTITION DDL to skip
the StateReplicaOnly retry loop waiting for TiFlash replica readiness.
When enabled, the DDL worker completes immediately; the background TiFlash
ticker handles promoting the new partition to AvailablePartitionIDs once
replication actually completes, preventing premature availability.

This is useful for Lightning ETL workloads that write only to TiKV and
do not require TiFlash readiness before the partition is made public.

Fixes: pingcap#67919
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 20, 2026

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Apr 20, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai bot commented Apr 20, 2026

@premal I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot bot added contribution This PR is from a community contributor. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Apr 20, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 20, 2026

Hi @premal. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign terry1purcell, wjhuang2016 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tiprow
Copy link
Copy Markdown

tiprow bot commented Apr 20, 2026

Hi @premal. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 20, 2026

📝 Walkthrough

Walkthrough

A new session variable tidb_skip_tiflash_replica_wait is introduced to conditionally bypass TiFlash replica readiness polling during ADD PARTITION DDL operations. When enabled, partition additions complete immediately while background TiFlash ticker handles eventual replication synchronization.

Changes

Cohort / File(s) Summary
System Variable Definition
pkg/sessionctx/vardef/tidb_vars.go
Added constant TiDBSkipTiFlashReplicaWait and default value DefTiDBSkipTiFlashReplicaWait = false.
Session Variable Infrastructure
pkg/sessionctx/variable/session.go, pkg/sessionctx/variable/sysvar.go
Added SkipTiFlashReplicaWait field to SessionVars struct and registered session-scoped sysvar with SetSession/GetSession handlers.
DDL Execution Logic
pkg/ddl/executor.go, pkg/ddl/partition.go
Propagated session variable into DDL job; gated checkPartitionReplica polling and replica availability marking in onAddTablePartition on skipWait condition.
Test Coverage
pkg/sessionctx/variable/sysvar_test.go, pkg/ddl/tests/tiflash/ddl_tiflash_test.go
Added sysvar unit test verifying scope and behavior; added DDL regression test confirming partition skip-wait behavior and deferred TiFlash availability.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

release-note, size/M, approved, lgtm

Suggested reviewers

  • wjhuang2016
  • Benjamin2037
  • guo-shaoge

Poem

🐰 Hop skip and a DDL,
No waiting for TiFlash's spell!
Partitions fly swift and free,
While the ticker handles replication, you see.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a new system variable to skip TiFlash replica waiting during ADD PARTITION operations.
Description check ✅ Passed The PR description is comprehensive, covering problem statement, solution details, affected files, and test coverage with checkboxes. However, the issue reference format deviates from template expectations (uses 'close #67919' in inline text rather than structured field).
Linked Issues check ✅ Passed All coding requirements from issue #67919 are addressed: session variable created with SESSION scope, skipWait logic gates both StateReplicaOnly polling and AvailablePartitionIDs append, and backward compatibility preserved with default OFF.
Out of Scope Changes check ✅ Passed All changes directly support the stated objectives: variable definition, session context integration, DDL job packing, partition handling logic, and targeted tests. No extraneous modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- Assert variable is session-only (HasSessionScope=true, HasGlobalScope=false)
- Add skip=OFF regression path: ADD PARTITION with default setting still
  enters StateReplicaOnly and the background ticker eventually promotes
  the new partition into AvailablePartitionIDs
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 20, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-tests-checked label, please finished the tests then check the finished items in description.

For example:

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

📖 For more info, you can check the "Contribute Code" section in the development guide.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/sessionctx/vardef/tidb_vars.go (1)

697-700: Move this constant to the session-only sysvar block.

tidb_skip_tiflash_replica_wait is session-scoped, but this is currently placed under the block documented as “both in session and global scope.” Consider moving it to the session-only constants near the top of the file to keep scope grouping accurate.

♻️ Suggested move
-	// TiDBSkipTiFlashReplicaWait skips the StateReplicaOnly wait for TiFlash replica readiness
-	// during ADD PARTITION. Use when the caller writes only to TiKV and does not need TiFlash
-	// ready before the partition is made public.
-	TiDBSkipTiFlashReplicaWait = "tidb_skip_tiflash_replica_wait"

Add the same constant/comment in the // TiDB system variable names that only in session scope. const block.

As per coding guidelines, “Follow existing package-local conventions first and keep style consistent with nearby files.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sessionctx/vardef/tidb_vars.go` around lines 697 - 700, The constant
TiDBSkipTiFlashReplicaWait is defined in the wrong grouping; move the
TiDBSkipTiFlashReplicaWait = "tidb_skip_tiflash_replica_wait" declaration (and
its comment) out of the mixed-scope block and add it to the existing "// TiDB
system variable names that only in session scope." const block near the top of
the file so it lives with other session-only sysvars and the scope grouping
remains accurate.
pkg/sessionctx/variable/sysvar.go (1)

930-938: Move this session-only sysvar into the session-scope block.

The declaration is ScopeSession, but it’s currently placed in the later TiDB/global area. Keeping it with the other session-scope sysvars makes the registration easier to find and preserves this file’s documented ordering convention. As per coding guidelines, Follow existing package-local conventions first and keep style consistent with nearby files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sessionctx/variable/sysvar.go` around lines 930 - 938, The
TiDBSkipTiFlashReplicaWait session-only sysvar declaration
(vardef.TiDBSkipTiFlashReplicaWait with SetSession using
SessionVars.SkipTiFlashReplicaWait, GetSession using BoolToOnOff, and TiDBOptOn)
should be moved from the TiDB/global area into the existing session-scope block
where other ScopeSession sysvars are registered; locate the session-only
registrations in this file, cut this entire map entry and paste it alongside
those entries to preserve file ordering and conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/ddl/tests/tiflash/ddl_tiflash_test.go`:
- Around line 1093-1129: The test races the background TiFlash ticker when
asserting p2 is not yet available; pause the TiFlash ticker via the package's
failpoint/fake-ticker control before the ADD PARTITION for p2, run the negative
assertion using tb.Meta().TiFlashReplica.IsPartitionAvailable(newPartID), then
resume the ticker and assert the partition becomes available; similarly after
adding p3 with skip disabled, wait/resume the ticker and assert the specific
partition ID for p3 is present in AvailablePartitionIDs (use the same
tb.Meta().TiFlashReplica and IsPartitionAvailable checks) rather than only
calling CheckTableAvailableWithTableName; ensure the failpoint/ticker is enabled
before the test and disabled/cleaned up after.

---

Nitpick comments:
In `@pkg/sessionctx/vardef/tidb_vars.go`:
- Around line 697-700: The constant TiDBSkipTiFlashReplicaWait is defined in the
wrong grouping; move the TiDBSkipTiFlashReplicaWait =
"tidb_skip_tiflash_replica_wait" declaration (and its comment) out of the
mixed-scope block and add it to the existing "// TiDB system variable names that
only in session scope." const block near the top of the file so it lives with
other session-only sysvars and the scope grouping remains accurate.

In `@pkg/sessionctx/variable/sysvar.go`:
- Around line 930-938: The TiDBSkipTiFlashReplicaWait session-only sysvar
declaration (vardef.TiDBSkipTiFlashReplicaWait with SetSession using
SessionVars.SkipTiFlashReplicaWait, GetSession using BoolToOnOff, and TiDBOptOn)
should be moved from the TiDB/global area into the existing session-scope block
where other ScopeSession sysvars are registered; locate the session-only
registrations in this file, cut this entire map entry and paste it alongside
those entries to preserve file ordering and conventions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fd0fabe5-316b-427d-ac9b-49b42f921ea4

📥 Commits

Reviewing files that changed from the base of the PR and between 7f3e45f and c26333c.

📒 Files selected for processing (7)
  • pkg/ddl/executor.go
  • pkg/ddl/partition.go
  • pkg/ddl/tests/tiflash/ddl_tiflash_test.go
  • pkg/sessionctx/vardef/tidb_vars.go
  • pkg/sessionctx/variable/session.go
  • pkg/sessionctx/variable/sysvar.go
  • pkg/sessionctx/variable/sysvar_test.go

Comment on lines +1093 to +1129
// Enable skip-wait so the DDL worker bypasses the StateReplicaOnly retry loop.
tk.MustExec("SET SESSION tidb_skip_tiflash_replica_wait = ON")

// ADD PARTITION should complete immediately without waiting for TiFlash replication.
tk.MustExec("ALTER TABLE ddltiflash_skip ADD PARTITION (PARTITION p2 VALUES LESS THAN (30))")

// Verify the new partition is now public (DDL succeeded).
tb, err := s.dom.InfoSchema().TableByName(context.Background(), model.NewCIStr("test"), model.NewCIStr("ddltiflash_skip"))
require.NoError(t, err)
pi := tb.Meta().GetPartitionInfo()
require.NotNil(t, pi)
require.Equal(t, 0, len(pi.AddingDefinitions), "AddingDefinitions should be empty after DDL completes")

// Find the new partition ID.
var newPartID int64
for _, def := range pi.Definitions {
if def.Name.L == "p2" {
newPartID = def.ID
break
}
}
require.NotZero(t, newPartID, "partition p2 should exist in Definitions")

// The key correctness assertion: with skipWait=true, the DDL worker must NOT
// have prematurely added the new partition to AvailablePartitionIDs.
// The background TiFlash ticker is responsible for that once replication completes.
require.NotNil(t, tb.Meta().TiFlashReplica)
require.False(t, tb.Meta().TiFlashReplica.IsPartitionAvailable(newPartID),
"new partition should NOT be in AvailablePartitionIDs immediately after ADD PARTITION with skipWait=true")

// Regression: with skip=OFF (default), ADD PARTITION still enters StateReplicaOnly
// and the background ticker eventually adds the partition to AvailablePartitionIDs.
tk.MustExec("SET SESSION tidb_skip_tiflash_replica_wait = OFF")
tk.MustExec("ALTER TABLE ddltiflash_skip ADD PARTITION (PARTITION p3 VALUES LESS THAN (40))")

time.Sleep(ddl.PollTiFlashInterval * RoundToBeAvailablePartitionTable)
CheckTableAvailableWithTableName(s.dom, t, 1, []string{}, "test", "ddltiflash_skip")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the TiFlash availability assertions deterministic.

Line 1120 races the background TiFlash ticker: it can add p2 before the “immediately after ADD PARTITION” assertion runs, causing a false failure. Also, Lines 1128-1129 don’t verify that the default-OFF path added p3 to AvailablePartitionIDs; they only check table-level availability. Pause the ticker around the skip-wait negative assertion, then assert the specific partition ID after re-enabling it.

🧪 Proposed test hardening
 	// Enable skip-wait so the DDL worker bypasses the StateReplicaOnly retry loop.
 	tk.MustExec("SET SESSION tidb_skip_tiflash_replica_wait = ON")
 
+	// Stop the background ticker so the negative assertion below only verifies
+	// the DDL worker behavior, not a concurrent ticker update.
+	tickerPaused := true
+	require.NoError(t, failpoint.Enable("github.com/pingcap/tidb/pkg/ddl/BeforeRefreshTiFlashTickerLoop", `return`))
+	defer func() {
+		if tickerPaused {
+			require.NoError(t, failpoint.Disable("github.com/pingcap/tidb/pkg/ddl/BeforeRefreshTiFlashTickerLoop"))
+		}
+	}()
+
 	// ADD PARTITION should complete immediately without waiting for TiFlash replication.
 	tk.MustExec("ALTER TABLE ddltiflash_skip ADD PARTITION (PARTITION p2 VALUES LESS THAN (30))")
 
 	// Verify the new partition is now public (DDL succeeded).
@@
-	// Find the new partition ID.
-	var newPartID int64
-	for _, def := range pi.Definitions {
-		if def.Name.L == "p2" {
-			newPartID = def.ID
-			break
-		}
-	}
-	require.NotZero(t, newPartID, "partition p2 should exist in Definitions")
+	findPartitionID := func(pi *model.PartitionInfo, name string) int64 {
+		for _, def := range pi.Definitions {
+			if def.Name.L == name {
+				return def.ID
+			}
+		}
+		return 0
+	}
+	newPartID := findPartitionID(pi, "p2")
+	require.NotZero(t, newPartID, "partition p2 should exist in Definitions")
@@
 	require.False(t, tb.Meta().TiFlashReplica.IsPartitionAvailable(newPartID),
 		"new partition should NOT be in AvailablePartitionIDs immediately after ADD PARTITION with skipWait=true")
 
+	require.NoError(t, failpoint.Disable("github.com/pingcap/tidb/pkg/ddl/BeforeRefreshTiFlashTickerLoop"))
+	tickerPaused = false
+
 	// Regression: with skip=OFF (default), ADD PARTITION still enters StateReplicaOnly
 	// and the background ticker eventually adds the partition to AvailablePartitionIDs.
 	tk.MustExec("SET SESSION tidb_skip_tiflash_replica_wait = OFF")
 	tk.MustExec("ALTER TABLE ddltiflash_skip ADD PARTITION (PARTITION p3 VALUES LESS THAN (40))")
 
 	time.Sleep(ddl.PollTiFlashInterval * RoundToBeAvailablePartitionTable)
+	tb, err = s.dom.InfoSchema().TableByName(context.Background(), model.NewCIStr("test"), model.NewCIStr("ddltiflash_skip"))
+	require.NoError(t, err)
+	pi = tb.Meta().GetPartitionInfo()
+	require.NotNil(t, pi)
+	p3ID := findPartitionID(pi, "p3")
+	require.NotZero(t, p3ID, "partition p3 should exist in Definitions")
+	require.True(t, tb.Meta().TiFlashReplica.IsPartitionAvailable(p3ID),
+		"new partition should be in AvailablePartitionIDs after ADD PARTITION with skipWait=false")
 	CheckTableAvailableWithTableName(s.dom, t, 1, []string{}, "test", "ddltiflash_skip")

As per coding guidelines, keep test changes minimal and deterministic; unit tests in a package that uses failpoints must enable failpoints before tests and disable afterward.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/ddl/tests/tiflash/ddl_tiflash_test.go` around lines 1093 - 1129, The test
races the background TiFlash ticker when asserting p2 is not yet available;
pause the TiFlash ticker via the package's failpoint/fake-ticker control before
the ADD PARTITION for p2, run the negative assertion using
tb.Meta().TiFlashReplica.IsPartitionAvailable(newPartID), then resume the ticker
and assert the partition becomes available; similarly after adding p3 with skip
disabled, wait/resume the ticker and assert the specific partition ID for p3 is
present in AvailablePartitionIDs (use the same tb.Meta().TiFlashReplica and
IsPartitionAvailable checks) rather than only calling
CheckTableAvailableWithTableName; ensure the failpoint/ticker is enabled before
the test and disabled/cleaned up after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. do-not-merge/needs-tests-checked do-not-merge/needs-triage-completed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ddl: ADD PARTITION with TiFlash replica causes unnecessary ~2.5s wait per partition

1 participant