pkg/ttl/ttlworker: stabilize flaky TestCancelWhileScan by flaky-claw · Pull Request #67885 · pingcap/tidb

flaky-claw · 2026-04-18T18:28:41Z

What problem does this PR solve?

Issue Number: close #66982

Problem Summary:
Flaky test TestCancelWhileScan in pkg/ttl/ttlworker intermittently fails, so this PR stabilizes that path.

What changed and how does it work?

Root Cause

TestCancelWhileScan was using the default testkit.NewTestKit(...) constructor, which goes through the standard TestKit session initialization path. For this test, that setup can introduce instability in how the case is initialized.

Fix

This PR replaces testkit.NewTestKit(...) with testkit.NewTestKitWithSession(t, store, testkit.NewSession(t, store)) in TestCancelWhileScan.

No other test logic is changed. The fix is limited to the test setup so the case runs with an explicitly created session instead of the default constructor path.

Verification

Spec:

target: pkg/ttl/ttlworker :: TestCancelWhileScan
strategy: tidb.go_flaky.default
plan mode: BASELINE_ONLY
requirements: required case must execute; no skip; repeat count = 1
baseline gates: required_flaky_gate, build_safety_gate, intent_guard_gate

Observed result:

status: passed
required case executed: yes
submission decision: ALLOWED
scope debt present: yes

Gate checklist:

Required flaky gate: PASS
Build safety gate: PASS
Intent guard gate: PASS
Repo-wide advisory gate: SKIPPED
Feedback specific gate: SKIPPED

Commands:

go test -json ./pkg/ttl/ttlworker -run '^TestCancelWhileScan$' -count=1
go test -json ./pkg/ttl/ttlworker -count=1
make build

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Fixes #66982

Summary by CodeRabbit

Tests
- Improved test infrastructure for TTL scan operations to enhance test session handling and reliability.

pantheon-ai · 2026-04-18T18:28:46Z

@flaky-claw I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

_{ℹ️ Learn more details on Pantheon AI.}

ti-chi-bot · 2026-04-18T18:28:49Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign july2993 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tiprow · 2026-04-18T18:28:59Z

Hi @flaky-claw. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2026-04-18T18:29:04Z

📝 Walkthrough

Walkthrough

The test TestCancelWhileScan in pkg/ttl/ttlworker/scan_integration_test.go was modified to initialize its TestKit with an explicitly created session instead of relying on default session creation. This changes the session lifecycle management within the test to address test flakiness.

Changes

Cohort / File(s)	Summary
TTL Worker Test Initialization `pkg/ttl/ttlworker/scan_integration_test.go`	Modified `TestCancelWhileScan` to use `testkit.NewTestKitWithSession()` with an explicit session instead of `testkit.NewTestKit()`, improving session lifecycle management during test execution.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

ttl: fix flaky TestIterationOfRunningJob #67243: Similar fix switching TTL worker tests from default to explicit session creation to eliminate session-related race conditions.
ttl: harden running-job iteration test session usage #67653: Modifies TTL test session creation to use explicit/short-lived sessions preventing session expiry issues.
ttl: honor scan task cancellation across statement boundaries #67285: Updates TTL scan/session APIs and test sites to pass explicit sessions/contexts, including changes to session initialization in scan integration tests.

Suggested labels

size/S, ok-to-test, approved, lgtm

Suggested reviewers

YangKeao
bb7133
D3Hunter
wjhuang2016

Poem

🐰 A session so fine, now explicit and clear,
No races will haunt this test year after year,
With management tight from start to the end,
The flaky behavior bends and will mend! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: stabilizing a flaky test in the TTL worker package by modifying the TestCancelWhileScan test.
Linked Issues check	✅ Passed	The PR directly addresses issue `#66982` by stabilizing the flaky TestCancelWhileScan test through a targeted fix that restores the original test behavior while keeping only the necessary constructor compatibility change.
Out of Scope Changes check	✅ Passed	The change is narrowly scoped to a single test file modification (TestCancelWhileScan in scan_integration_test.go) that directly addresses the flaky test issue with minimal alteration.
Description check	✅ Passed	PR description includes all required sections with sufficient detail: issue number with link, clear problem statement, root cause analysis, specific fix, verification strategy with gate results, and checklist completion.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

pkg/ttl/ttlworker/scan_integration_test.go (1)
37-37: Constructor swap looks fine, but note the skipped initialization.

NewTestKitWithSession bypasses RefreshSession's "select 3" sysvar-cache priming and the MockSessionManager registration that NewTestKit performs (see pkg/testkit/testkit.go lines 79–111 and 131–135). For this test that only issues DDL + inserts and then drives DoScan directly via dom.AdvancedSysSessionPool() (not through tk), skipping those steps appears safe and aligns with the stated intent of minimizing scope. Worth a brief inline comment explaining why the explicit-session constructor is used here while TestCancelWhileScanAtStatementBoundary (Line 105) keeps NewTestKit, so future readers don't "normalize" the two call sites.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/ttl/ttlworker/scan_integration_test.go` at line 37, Add a brief inline
comment next to the call to NewTestKitWithSession (and/or before the tk
variable) explaining that this test intentionally uses the explicit-session
constructor to avoid RefreshSession's "select 3" sysvar-cache priming and
MockSessionManager registration performed by NewTestKit, and that this is safe
because the test performs only DDL/inserts and calls DoScan via
dom.AdvancedSysSessionPool() rather than tk; mention that
TestCancelWhileScanAtStatementBoundary still uses NewTestKit to preserve the
full session initialization for that scenario.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/ttl/ttlworker/scan_integration_test.go`:
- Line 37: Add a brief inline comment next to the call to NewTestKitWithSession
(and/or before the tk variable) explaining that this test intentionally uses the
explicit-session constructor to avoid RefreshSession's "select 3" sysvar-cache
priming and MockSessionManager registration performed by NewTestKit, and that
this is safe because the test performs only DDL/inserts and calls DoScan via
dom.AdvancedSysSessionPool() rather than tk; mention that
TestCancelWhileScanAtStatementBoundary still uses NewTestKit to preserve the
full session initialization for that scenario.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 56f18886-f45a-480d-871c-b767332460d6

📥 Commits

Reviewing files that changed from the base of the PR and between ce92298 and 2944aa1.

📒 Files selected for processing (1)

pkg/ttl/ttlworker/scan_integration_test.go

codecov · 2026-04-18T18:46:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.4536%. Comparing base (e3f45e4) to head (2944aa1).
⚠️ Report is 11 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #67885        +/-   ##
================================================
- Coverage   77.7969%   77.4536%   -0.3434%     
================================================
  Files          1983       1966        -17     
  Lines        548948     549947       +999     
================================================
- Hits         427065     425954      -1111     
- Misses       120962     123991      +3029     
+ Partials        921          2       -919

Flag	Coverage Δ
integration	`40.8898% <ø> (+1.0926%)`	⬆️
unit	`76.6682% <ø> (+0.3187%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`61.5065% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`50.0991% <ø> (-13.0124%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

yinsustart · 2026-04-21T05:32:52Z

/retest

tiprow · 2026-04-21T05:33:16Z

@yinsustart: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

YangKeao · 2026-04-21T05:39:52Z

 func TestCancelWhileScan(t *testing.T) {
 	store, dom := testkit.CreateMockStoreAndDomain(t)
-	tk := testkit.NewTestKit(t, store)
+	tk := testkit.NewTestKitWithSession(t, store, testkit.NewSession(t, store))


I don't understand how this change will stablize the test. It also doesn't match the PR description.

fix: stabilize flaky issue pingcap#66982

2944aa1

ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 18, 2026

ti-chi-bot bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 18, 2026

coderabbitai bot reviewed Apr 18, 2026

View reviewed changes

yinsustart requested a review from YangKeao April 21, 2026 05:36

YangKeao reviewed Apr 21, 2026

View reviewed changes

yinsustart closed this Apr 21, 2026

Conversation

flaky-claw commented Apr 18, 2026 • edited by yinsustart Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Root Cause

Fix

Verification

Check List

Release note

Summary by CodeRabbit

Uh oh!

pantheon-ai bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ti-chi-bot bot commented Apr 18, 2026

Uh oh!

tiprow bot commented Apr 18, 2026

Uh oh!

coderabbitai bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yinsustart commented Apr 21, 2026

Uh oh!

tiprow bot commented Apr 21, 2026

Uh oh!

YangKeao Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

flaky-claw commented Apr 18, 2026 •

edited by yinsustart

Loading

pantheon-ai bot commented Apr 18, 2026 •

edited

Loading

coderabbitai bot commented Apr 18, 2026 •

edited

Loading

codecov bot commented Apr 18, 2026 •

edited

Loading