Failure
PR: #5098
CI: https://prow.tidb.net/jenkins/job/pingcap/job/ticdc/job/pull_cdc_mysql_integration_heavy/2058/display/redirect
Job: pull_cdc_mysql_integration_heavy #2058
Failed group: G05
Case: ddl_for_split_tables_with_merge_and_split with mysql sink
Evidence
Jenkins failed in the Test stage for TEST_GROUP = 'G05' with script returned exit code 1.
The failing command was the split-table helper path. merge_table_with_retry repeatedly failed for table 119:
{ "success": false, "error": "Can't not find maintainer for changefeed: test" }
{ "success": false, "error": "[CDC:ErrTableIsNotFounded]table is not found%!(EXTRA string=tableID, int64=119)" }
merge table 119 failed after 10 retries
The case logs show both TiCDC captures exited before the helper finished:
Error: [CDC:ErrCaptureSuicide]capture suicide
cdc0.log shows the direct cause as etcd session loss:
[WARN] [etcd_watcher.go:70] ["session is disconnected"] [error="[CDC:ErrEtcdSessionDone]the etcd session is done"]
[ERROR] [server.go:152] ["cdc server exits with error"] [error="[CDC:ErrCaptureSuicide]capture suicide"]
cdc1.log exits with the same ErrCaptureSuicide. Around the same time, down_pd.log shows etcd/PD instability: slow linearizable reads, ReadIndex retry, TSO save timestamp failure, not leader, and slow fsync.
I did not find a table-route conflict / RouteAdmin error in the captured CDC logs. The observed failure is that the CDC cluster lost its maintainer after both captures committed suicide, then the test helper kept calling merge-table.
Expected
The test should not fail as a table scheduling failure when the underlying CDC captures have already exited due to PD/etcd session loss. It should either tolerate transient maintainer unavailability with a meaningful wait/retry path, or report the capture-suicide root cause directly.
Notes
This looks like a flaky integration-test/environment failure rather than a regression in the table-route conflict detector branch. The failure URL above should be kept as the reproduction evidence.
Failure
PR: #5098
CI: https://prow.tidb.net/jenkins/job/pingcap/job/ticdc/job/pull_cdc_mysql_integration_heavy/2058/display/redirect
Job:
pull_cdc_mysql_integration_heavy#2058Failed group:
G05Case:
ddl_for_split_tables_with_merge_and_splitwith mysql sinkEvidence
Jenkins failed in the
Teststage forTEST_GROUP = 'G05'withscript returned exit code 1.The failing command was the split-table helper path.
merge_table_with_retryrepeatedly failed for table119:The case logs show both TiCDC captures exited before the helper finished:
cdc0.logshows the direct cause as etcd session loss:cdc1.logexits with the sameErrCaptureSuicide. Around the same time,down_pd.logshows etcd/PD instability: slow linearizable reads, ReadIndex retry, TSO save timestamp failure,not leader, and slow fsync.I did not find a table-route conflict / RouteAdmin error in the captured CDC logs. The observed failure is that the CDC cluster lost its maintainer after both captures committed suicide, then the test helper kept calling
merge-table.Expected
The test should not fail as a table scheduling failure when the underlying CDC captures have already exited due to PD/etcd session loss. It should either tolerate transient maintainer unavailability with a meaningful wait/retry path, or report the capture-suicide root cause directly.
Notes
This looks like a flaky integration-test/environment failure rather than a regression in the table-route conflict detector branch. The failure URL above should be kept as the reproduction evidence.