Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Add stats print for slot migrations #4456

Merged
merged 7 commits into from
Jan 15, 2025
Merged

Conversation

chakaz
Copy link
Collaborator

@chakaz chakaz commented Jan 14, 2025

Fixes #4415

@chakaz chakaz requested a review from adiholden January 14, 2025 11:49
@@ -331,7 +350,7 @@ void RestoreStreamer::WriteEntry(string_view key, const PrimeValue& pk, const Pr
ThrottleIfNeeded();
},
ServerState::tlocal()->serialization_max_chunk_size);
serializer.SerializeEntry(key, pk, pv, expire_ms);
stats_.commands += serializer.SerializeEntry(key, pk, pv, expire_ms);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually counting the number of flush to sync that we have right (number of times you call commit)? not the number of commands generated from each entry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excluding timeout and stickiness they should be the same

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see now so this is not actually the number of times we actually flush to sync but the time we write to the pending_buffer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -310,6 +327,8 @@ void RestoreStreamer::OnDbChange(DbIndex db_index, const DbSlice::ChangeReq& req
std::lock_guard guard(big_value_mu_);
DCHECK_EQ(db_index, 0) << "Restore migration only allowed in cluster mode in db0";

stats_.buckets_on_db_update++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it will be better to count actually how many buckets where serialized on ondbchange i,e Write bucket will return true if the bucket was serialized false if skipped
This way if we have a very week test that will send few commands but we almost covered the traveres we will skip the write bucket in this flow and would not be able to detect such week tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, you want this in order to handle potential CVCUponInsert() handling multiple buckets? If so we can increase it there, if not, I think I did not understand your comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to understand that buckets where actually serialized from this flow and not only the command was called. Because if it is called after the bucket is serialized than we dont do anything (skip write bucket)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh gotcha, sure thing!

// We send RESTORE commands for small objects, or objects we don't support breaking.
bool use_restore_serialization = true;
size_t commits = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about rename to flush

// We send RESTORE commands for small objects, or objects we don't support breaking.
bool use_restore_serialization = true;
size_t flushes = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After I dive into the code again related to your above comment of flush is the same as the number of commands. But this is not true. We actually accumulating the commands. You dont know if the data was actually written to socket

Copy link
Collaborator Author

@chakaz chakaz Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to commands

adiholden
adiholden previously approved these changes Jan 15, 2025
@chakaz
Copy link
Collaborator Author

chakaz commented Jan 15, 2025

@adiholden please re-approve, I had to tweak asserted number of keys because target_deviation=0.1 and not 0.05, sorry

@chakaz chakaz enabled auto-merge (squash) January 15, 2025 10:52
@chakaz chakaz merged commit 5ba608b into main Jan 15, 2025
9 checks passed
@chakaz chakaz deleted the chakaz/migration-stats branch January 15, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add stats to migration process and check them in test
2 participants