Skip to content

[Bug] Write to StarRocks failed occasionally. #383

@gohalo

Description

@gohalo

Got the following error message from flink task manager.

2024-09-05 06:33:49,271 | ERROR | [StarRocks-Sink-Manager] | Transaction prepare failed, db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri, label: flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3, [N]responseBody: {[N]    "Status": "TXN_IN_PROCESSING",[N]    "Label": "flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3",[N]    "Message": "Transaction in processing, please retry later"[N]}[N]errorLog: null | com.starrocks.data.load.stream.TransactionStreamLoader.prepare(TransactionStreamLoader.java:220)
2024-09-05 06:33:49,272 | ERROR | [StarRocks-Sink-Manager] | TransactionTableRegion commit failed, db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri, label: flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3 | com.starrocks.data.load.stream.v2.TransactionTableRegion.commit(TransactionTableRegion.java:257)
com.starrocks.data.load.stream.exception.StreamLoadFailException: Transaction prepare failed, db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri, label: flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3, [N]responseBody: {[N]    "Status": "TXN_IN_PROCESSING",[N]    "Label": "flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3",[N]    "Message": "Transaction in processing, please retry later"[N]}[N]errorLog: null
	at com.starrocks.data.load.stream.TransactionStreamLoader.prepare(TransactionStreamLoader.java:221) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
	at com.starrocks.data.load.stream.v2.TransactionTableRegion.commit(TransactionTableRegion.java:247) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
	at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.lambda$init$0(StreamLoadManagerV2.java:191) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_402]
2024-09-05 06:33:49,272 | ERROR | [StarRocks-Sink-Manager] | Failed to flush data for db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri after 0 times retry, the last exception is | com.starrocks.data.load.stream.v2.TransactionTableRegion.fail(TransactionTableRegion.java:285)
com.starrocks.data.load.stream.exception.StreamLoadFailException: Transaction prepare failed, db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri, label: flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3, [N]responseBody: {[N]    "Status": "TXN_IN_PROCESSING",[N]    "Label": "flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3",[N]    "Message": "Transaction in processing, please retry later"[N]}[N]errorLog: null
	at com.starrocks.data.load.stream.TransactionStreamLoader.prepare(TransactionStreamLoader.java:221) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
	at com.starrocks.data.load.stream.v2.TransactionTableRegion.commit(TransactionTableRegion.java:247) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
	at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.lambda$init$0(StreamLoadManagerV2.java:191) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_402]
2024-09-05 06:33:49,273 | ERROR | [StarRocks-Sink-Manager] | Stream load failed | com.starrocks.data.load.stream.v2.StreamLoadManagerV2.callback(StreamLoadManagerV2.java:340)
com.starrocks.data.load.stream.exception.StreamLoadFailException: Transaction prepare failed, db: ods, table: ods_fin_cust_account_t_keep_acct_detail_ri, label: flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3, [N]responseBody: {[N]    "Status": "TXN_IN_PROCESSING",[N]    "Label": "flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3",[N]    "Message": "Transaction in processing, please retry later"[N]}[N]errorLog: null
	at com.starrocks.data.load.stream.TransactionStreamLoader.prepare(TransactionStreamLoader.java:221) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
	at com.starrocks.data.load.stream.v2.TransactionTableRegion.commit(TransactionTableRegion.java:247) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
	at com.starrocks.data.load.stream.v2.StreamLoadManagerV2.lambda$init$0(StreamLoadManagerV2.java:191) ~[flink-connector-starrocks-1.2.9_flink-1.17.jar:?]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_402]

And the CN.

10905 06:34:56.900581 1073155 transaction_mgr.cpp:190] new transaction manage request, id=6c4ee28b72465107-027odf3576108193, job_id -1, tx_d: 67681501, lobel flink-49eS1069-70b2-41ee-8c22-d40d07a6e3d3, db=ods, tbl ods_fin_cust_account_t_keep_acct_detail_ri op=begin
10905 06:34:56.955199 1073155 transaction_strean_load.cpp:236] new transaction load request.id=6c4ee28b72465107-027adf3576108193, job_id -1, txn_id: 67681501, lobel flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3, db=ods, tbl ods_fin_cust_account_t_keep_acct_detail_ri
10905 06:34:56.973167 1073155 stream_load_executor.cpp:67] begin to execute job. labelaflink-49eS1069-70b2-41ee-8c22-d40d07a6e3d3,txn_id: 67681501, query_id 6c4ee28b-7246-5107-027a-df3576108193
10905 06:34:56.973213 1073155 plon_fragment executor.cpp:83] Prepare(): query_id-6c4ee286-7246-S107-027a-df3576108193 frogment_instance_id-6<4ee28b-7246-5107-027a-df3576108194 backend_num-0
10905 06:34:56.976012 1072502 plon_fragment_executor.cpp:185] Open(): fragment_instance_id 6c4ee28b-7246-5107-027a-df3576108194
10905 06:34:56.981921 1073152 transaction_mgr.cpp:241] new transaction manage request, id-6C4£e28b7246S107-027odf3576108193, job_id -1, txn_id: 67681501, label flink-49eS1063-70b2-41ee-8c22-d40d07a6e3d3, db ods, tbl ods_fin_cust_account_t_keep_acct_detail_ri op=prepare
I0905 06:34:56.988793 1073155 transaction_mgr.cpp:213]new transaction manage request, id 6c4£e28b72465107-027odf35761€8193, job_id -1, txn_d: 67681501, label flink-49e51063-70b2-41ee-8c22-d40d07a6e3d3, 	db-ods, tbl ods_fin_cust_account_t_keep_acct_detail_ri op=rollback
10905 06:34:56.988806 1073155 transaction_mgr.cpp:368] Rollback transaction id-6£4ee28b72465107-027odf3576108193, job_id--1, txn_id: 67681501, label flink-49e51069-70b2-41ee-8c22-d40d07a6e3d3
10905 06:34:56.989843 1073098 Lake_service.cpp:334] Aborting transactions-[67681501] tablets-□
10905 06:34:56.989853 1073098 load_Channel_mgr.cpp:280] Aborting load channel because transaction was aborted, load_id-6c4ee28b724651a7-027adf3576108193 txn_id-67681501

It's because the following code, which expect the flink connector to retry later, but actually only one attempt.

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions