-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue-3: add device timeout notify to rdma partition #3187
base: main
Are you sure you want to change the base?
Conversation
Hi! Thank you for contributing! |
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor_checksumblocks.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor_zeroblocks.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_ut.cpp
Outdated
Show resolved
Hide resolved
d5bf900
to
b7debd1
Compare
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor_readblocks_local.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor_writeblocks.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_ut.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.h
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.h
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
|
||
auto& dCtx = DeviceTimeouted[deviceUUID]; | ||
dCtx.FirstErrorTs = ctx.Now(); | ||
dCtx.ParentWasNotified = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В interconnect партишионе есть проблема: таймаут запроса может прийти после того как всё раздуплилось. Тут такое может быть?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
если пришел таймаут то ответ на реквест уже не получим, если ты спрашивал про это.
Здесь
nbs/cloud/blockstore/libs/rdma/impl/client.cpp
Lines 1529 to 1548 in 9a7b1be
void DropTimedOutRequests() | |
{ | |
auto endpoints = Endpoints.Get(); | |
for (const auto& endpoint: *endpoints) { | |
if (!endpoint->CheckState(EEndpointState::Connected)) { | |
continue; | |
} | |
auto requests = endpoint->ActiveRequests.PopTimedOutRequests( | |
DurationToCyclesSafe(Config->MaxResponseDelay)); | |
for (auto& request: requests) { | |
endpoint->AbortRequest( | |
std::move(request), | |
E_TIMEOUT, | |
"request timeout"); | |
} | |
} | |
} |
мы извлекаем таймаученный запрос отвекчаем на него и соответственно уничтожаем
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.h
Outdated
Show resolved
Hide resolved
requestInfo.Value.ActorId, | ||
std::make_unique< | ||
TEvNonreplPartitionPrivate::TEvCancelRequest>( | ||
EReason::Canceled)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
А где обработка TEvCancelRequest ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Вынесу всю логику с отменой запросов в отдельный пр
{ | ||
NCloud::Send( | ||
ctx, | ||
requestInfo.Value.ActorId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Где заполняется поле ActorId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Вынесу всю логику с отменой запросов в отдельный пр
@@ -67,6 +111,11 @@ class TNonreplicatedPartitionRdmaActor final | |||
|
|||
TRequestInfoPtr Poisoner; | |||
|
|||
struct TDeviceTimeoutCtx { | |||
TInstant FirstErrorTs; | |||
bool ParentWasNotified = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
А что такое parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PartConfig->GetParentActorId()
т.е. волум
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
переименовал в VolumeWasNotified
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.h
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.h
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor_checksumblocks.cpp
Outdated
Show resolved
Hide resolved
NCloud::Send( | ||
ctx, | ||
PartConfig->GetParentActorId(), | ||
std::make_unique<TEvVolumePrivate::TEvDeviceTimeoutedRequest>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
мы же собираемся переименовать Timeouted в TimedOut?
можно в отдельном ПРе
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.h
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor.h
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor_checksumblocks.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor_writeblocks.cpp
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor_readblocks.cpp
Outdated
Show resolved
Hide resolved
cloud/blockstore/libs/storage/partition_nonrepl/part_nonrepl_rdma_actor_readblocks_local.cpp
Outdated
Show resolved
Hide resolved
3903229
to
180d548
Compare
#3
Adding volume notifying if some of requests time outed or rdma endpoint unavailable