[dinky-admin] Fix issue where a job is still running but Dinky shows … #4462
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
问题
在以下场景下,任务实际仍在运行,但 Dinky 显示任务状态为失败或未知,且任务名旁的小火苗消失:
任务由 Flink Operator 管理,Operator对任务进行重新部署后导致jobId 发生变化。
任务由 Flink Operator 管理,失败的任务被重新拉起并成功运行。
K8s 模式下,Dinky 启动任务超时,但任务实际已在 K8s 中启动成功。
K8s 模式下,任务 pod 被临时缩容为 0,导致 Dinky 误判为失败。
Dinky 获取 Flink 任务数据时,误将任务标记为未知。
变更
更新任务信息时,先检查 jobId 是否变化,如变化则更新任务实例的 jobId。
每隔五分钟检查一次失败任务,如发现任务已重新运行成功,则将其重新放回监控队列。