-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[fix](scheduler) Fix coredump due to different queue size #55709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
run buildall |
TPC-H: Total hot run time: 34210 ms
|
TPC-DS: Total hot run time: 187345 ms
|
ClickBench: Total hot run time: 29.94 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
run buildall |
TPC-H: Total hot run time: 34977 ms
|
TPC-DS: Total hot run time: 185984 ms
|
ClickBench: Total hot run time: 29.72 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
if (_on_blocking_scheduler) { | ||
_tracking.blocking_thread_id = thread_id; | ||
} else { | ||
_tracking.simple_thread_id = thread_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要这么做。pipeline task 本身不应该感知到自己是在哪个调度器里,hybrid task scheduler 随时可能结构变化。 可以考虑每次用自己记录的core id % 目标queue 的sub queue 数量来避免core
What problem does this PR solve?
start BE in local mode
*** Query id: d82def8526424e69-b00c82d310ccaa1d ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1757020283 (unix time) try "date -d @1757020283" if you are using GNU date ***
*** Current BE git commitID: 0cbb0bf ***
*** SIGSEGV unknown detail explain (@0x0) received by PID 8317 (TID 10279 OR 0x7fa9b984b640) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:420
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
3# 0x00007FAD203E7520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::pipeline::PriorityTaskQueue::push(std::shared_ptr) at /home/zcp/repo_center/doris_master/doris/be/src/pipeline/task_queue.cpp:129
5# doris::pipeline::MultiCoreTaskQueue::push_back(std::shared_ptr, int) at /home/zcp/repo_center/doris_master/doris/be/src/pipeline/task_queue.cpp:204
6# doris::pipeline::MultiCoreTaskQueue::push_back(std::shared_ptr) at /home/zcp/repo_center/doris_master/doris/be/src/pipeline/task_queue.cpp:198
7# doris::pipeline::TaskScheduler::submit(std::shared_ptr) at /home/zcp/repo_center/doris_master/doris/be/src/pipeline/task_scheduler.cpp:74
8# doris::pipeline::HybridTaskScheduler::submit(std::shared_ptr) at /home/zcp/repo_center/doris_master/doris/be/src/pipeline/task_scheduler.cpp:189
9# doris::pipeline::PipelineTask::wake_up(doris::pipeline::Dependency*) at /home/zcp/repo_center/doris_master/doris/be/src/pipeline/pipeline_task.cpp:803
10# doris::pipeline::Dependency::set_ready() at /home/zcp/repo_center/doris_master/doris/be/src/pipeline/dependency.cpp:88
11# doris::vectorized::VDataStreamRecvr::SenderQueue::add_block(std::unique_ptr >, int, long, google::protobuf::Closure**, long, unsigned long) at /home/zcp/repo_center/doris_master/doris/be/src/vec/runtime/vdata_stream_recvr.cpp:194
12# doris::vectorized::VDataStreamRecvr::add_block(std::unique_ptr >, int, int, long, google::protobuf::Closure**, long, unsigned long) at /home/zcp/repo_center/doris_master/doris/be/src/vec/runtime/vdata_stream_recvr.cpp:403
13# doris::vectorized::VDataStreamMgr::transmit_block(doris::PTransmitDataParams const*, google::protobuf::Closure**, long) at /home/zcp/repo_center/doris_master/doris/be/src/vec/runtime/vdata_stream_mgr.cpp:167
14# doris::PInternalService::_transmit_block(google::protobuf::RpcController*, doris::PTransmitDataParams const*, doris::PTransmitDataResult*, google::protobuf::Closure*, doris::Status const&, long) in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be
15# doris::PInternalService::transmit_block(google::protobuf::RpcController*, doris::PTransmitDataParams const*, doris::PTransmitDataResult*, google::protobuf::Closure*) at /home/zcp/repo_center/doris_master/doris/be/src/service/internal_service.cpp:1624
16# brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*) in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be
17# brpc::ProcessInputMessage(void*) in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be
18# bthread::TaskGroup::task_runner(long) in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be
19# bthread_make_fcontext in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be
172.20.57.180 last coredump sql: 2025-09-05 05:11:54,768 [query] Query d82def8526424e69-b00c82d310ccaa1d 1 times with new query id: fc55dbd3bbf24d9d-b18012e9210c6be0
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)