flaky test: ddl_for_split_tables_with_merge_and_split fails after TiCDC capture suicide

## Failure

PR: https://github.com/pingcap/ticdc/pull/5098
CI: https://prow.tidb.net/jenkins/job/pingcap/job/ticdc/job/pull_cdc_mysql_integration_heavy/2058/display/redirect
Job: `pull_cdc_mysql_integration_heavy` #2058
Failed group: `G05`
Case: `ddl_for_split_tables_with_merge_and_split` with mysql sink

## Evidence

Jenkins failed in the `Test` stage for `TEST_GROUP = 'G05'` with `script returned exit code 1`.

The failing command was the split-table helper path. `merge_table_with_retry` repeatedly failed for table `119`:

```text
{ "success": false, "error": "Can't not find maintainer for changefeed: test" }
{ "success": false, "error": "[CDC:ErrTableIsNotFounded]table is not found%!(EXTRA string=tableID, int64=119)" }
merge table 119 failed after 10 retries
```

The case logs show both TiCDC captures exited before the helper finished:

```text
Error: [CDC:ErrCaptureSuicide]capture suicide
```

`cdc0.log` shows the direct cause as etcd session loss:

```text
[WARN] [etcd_watcher.go:70] ["session is disconnected"] [error="[CDC:ErrEtcdSessionDone]the etcd session is done"]
[ERROR] [server.go:152] ["cdc server exits with error"] [error="[CDC:ErrCaptureSuicide]capture suicide"]
```

`cdc1.log` exits with the same `ErrCaptureSuicide`. Around the same time, `down_pd.log` shows etcd/PD instability: slow linearizable reads, ReadIndex retry, TSO save timestamp failure, `not leader`, and slow fsync.

I did not find a table-route conflict / RouteAdmin error in the captured CDC logs. The observed failure is that the CDC cluster lost its maintainer after both captures committed suicide, then the test helper kept calling `merge-table`.

## Expected

The test should not fail as a table scheduling failure when the underlying CDC captures have already exited due to PD/etcd session loss. It should either tolerate transient maintainer unavailability with a meaningful wait/retry path, or report the capture-suicide root cause directly.

## Notes

This looks like a flaky integration-test/environment failure rather than a regression in the table-route conflict detector branch. The failure URL above should be kept as the reproduction evidence.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flaky test: ddl_for_split_tables_with_merge_and_split fails after TiCDC capture suicide #5176

Failure

Evidence

Expected

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

flaky test: ddl_for_split_tables_with_merge_and_split fails after TiCDC capture suicide #5176

Description

Failure

Evidence

Expected

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions