-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add current_fork_perc in info all command #2640
Conversation
e415e88
to
a2e794e
Compare
src/server/server_family.cc
Outdated
append("save_buffer_bytes", save_controller_->GetSaveBuffersSize()); | ||
} | ||
if (has_save_controller) { | ||
append("save_buffer_bytes", save_controller_->GetSaveBuffersSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to lock save_mu_ before accessing save_controller_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg 😱 😱 good catch!
src/server/server_state.h
Outdated
size_t keys_total = 0; | ||
}; | ||
|
||
CurrentSaveStats current_save_stats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why dont you put this stats in SliceSnapshot class and add a funciton in SaveStagesController that aggregates the stats as we do in GetSaveBuffersSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have a new function GetSaveProgress you will not need the save_mode_ as well. you can always set the counters and reply them in info persistance only if save_controller_ exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that we need save_mode_
because otherwise a new SliceSnapshot
(that is not part of the bgsave/save operation) might overwrite the thread local. We exercise almost the same path in replication, so it could be the case that replication and a bgsave operation happen at the same time. Both of the two command paths will happily increment the thread locals messing with the correctness.
Second is that with what I have here, we save a hop. We don't dispatch a cb to fetch and aggregate the thread locals. We are in non critical path so this is less important but still
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am suggesting to add the stats to SliceSnapshot class, not to use the thread local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the change and to be honest I am not very happy with it:
- It's more code and introduces a way more convoluted code path
- I needed 3 different proxy functions to reach the
SliceSnapshot
😢 - code repetition because we need to handle explicitly different formats/requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it's fine for now, but at some point I would consider a refactor. It's too much work to plug in stat tracking. That 3 proxy call in GetSCurrentSnapshotProgress
killed it for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree there are lots of proxy functions in this flow and it looks messy
we can check later if we can do some refactoring here to make it simpler
src/server/server_family.cc
Outdated
total = 0; | ||
} | ||
|
||
append("current_fork_perc", perc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can change the name here, use different one than redis as we dont have a fork
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree I just opted in for consistency instead of adding a different name. I will rename accordingly
src/server/server_family.h
Outdated
@@ -99,6 +99,8 @@ struct Metrics { | |||
// Max length of the all the tx shard-queues. | |||
uint32_t tx_queue_len = 0; | |||
|
|||
ServerState::CurrentSaveStats current_save_stats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already removed :)
src/server/snapshot.cc
Outdated
@@ -360,4 +364,8 @@ size_t SliceSnapshot::GetTotalChannelCapacity() const { | |||
return dest_->GetSize(); | |||
} | |||
|
|||
std::pair<size_t, size_t> SliceSnapshot::GetCurrentSnapshotProgress() const { | |||
return {stats_.loop_serialized, stats_.keys_total}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side_saved + loop_serialized
is the number of entries serialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
src/server/server_family.cc
Outdated
lock_guard lk{save_mu_}; | ||
if (save_controller_) { | ||
has_save_controller = true; | ||
save_buffer_bytes = save_controller_->GetSaveBuffersSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think each of this calls should be inside the relevant if otherwise you invoke the functions even if the user didnt ask for memory/persistence stats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but now we need to lock twice.... And we also now in the worst path, do an extra two hops.... The added overhead and complexity is piling up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also already have below a lock for save_mu_ for
curent_durration_sec = save_controller_->GetCurrentSaveDuration();
I think its better to have them inside the relevant path
You are doing the hops right now regardless of calling the mutex once or twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also one more comment here. Let's zoom out, from one side, we are saying here that we care about the potential two extra dispatches to the shards and the cost associated with it but at the same time we don't care that we added two extra shard dispatches (for GetCurrentSnapshotProgress) when in fact we did not need them in the first place..
Anyway i made the change -- just thinking out loud :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one hop for GetCurrentSnapshotProgress right? not 2. Or I did not understand what you mean..
@@ -129,6 +131,11 @@ size_t RdbSnapshot::GetSaveBuffersSize() { | |||
return saver_->GetTotalBuffersSize(); | |||
} | |||
|
|||
std::pair<size_t, size_t> RdbSnapshot::GetCurrentSnapshotProgress() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about returning a struct SnapshotStats, it will be more clear to reader and you wont need to add a comment what is the first and second param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
test failure is fixed by #2647 |
{ | ||
lock_guard lk{save_mu_}; | ||
if (save_controller_) { | ||
auto res = save_controller_->GetCurrentSnapshotProgress(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move this to line 1967 right? we already have this code
lock_guard lk{save_mu_};
if (save_controller_) {
there
fixes #2523