Skip to content

fix: make FreeMemWithEvictionStep atomic #4885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 2, 2025
Merged

fix: make FreeMemWithEvictionStep atomic #4885

merged 3 commits into from
May 2, 2025

Conversation

kostasrim
Copy link
Contributor

FreeMemWithEvictionStep was not atomic yet was used by a flow -- RetireExpireAndEvict() -- which should be atomic. Therefore, we make this function atomic (since it's only used by that flow).

Resolves #4875

@kostasrim kostasrim self-assigned this Apr 3, 2025
};

shard_set->pool()->AwaitBrief(std::move(cb));
}
}

void DbSlice::SendQueuedInvalidationMessagesAsync() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We drained it before because we blocked. Now we don't.

TODO investigate: we should call this also from heartbeat (so on the second iteration it drains it if there were items added).

Copy link
Contributor Author

@kostasrim kostasrim Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at this the while loop was redundant in the flow of heartbeat. Why? Because heartbeat was preempted(because it tried to sent the queued invalidated messages). pending_send_map_ can only change in two places: 1) during evictions from heartbeat 2) through db_slice and drained at the end via OnCbFinish(). The former (1) is guaranteed that it won't evict because it's preempted and until it completes it won't run another heartbeat on that thread while the later (2) drains the pending_send_map_ via OnCbFinish(). Therefore, there is no correctness issue of "missing or not draining all of the items in the pending_send_map_".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for DispatchBrief, I also don't see any issue whatsoever, they messages will reach the connections eventually and it should be enough

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DispatchBrief can also block btw, it's just a much rare event

Copy link
Contributor Author

@kostasrim kostasrim Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romange the rationale was this doesn't happen in practice because we don't reach this state easily (where we have the task queues internally full). Your comment is very valid, and maybe we should move this outside of the non preemptive critical section. I think it should be a simple change. I will update on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romange would it worth to add a function that won't dispatch if the task queue is full (but also won't preempt and return to the caller?)

That way, we can send the pending_items later if we can't dispatch because the intenral queues are full

@kostasrim kostasrim changed the title [wip] fix: make FreeMemWithEvictionStep atomic fix: make FreeMemWithEvictionStep atomic Apr 11, 2025
Comment on lines 1392 to 1396
if (bucket.IsEmpty())
continue;

if (!bucket.IsBusy(slot_id))
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (bucket.IsEmpty() || !bucket.IsBusy(slot_id))
continue;

events_.evicted_keys += evicted_items;
DVLOG(2) << "Eviction time (us): " << (time_finish - time_start) / 1000;
return pair<uint64_t, size_t>{evicted_items, evicted_bytes};
return finalize();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to call finalize if expired_keys_events_recording_ is false and journal is null

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it's better to check in finalize to avoid iterating over keys_to_journal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool record_keys = owner_->journal() != nullptr || expired_keys_events_recording_;

and then:

  1388         if (record_keys)                                                                          
  1389           keys_to_journal.emplace_back(key); 

So keys_to_journal will be empty if either we have key space notifications or if we need to write to the journal.

So no action really needed there because keys_to_journal will be empty.

What we can save is calling SendQueuedInvalidationMessagesAsync if pending_send_map.empty()

// send the deletion to the replicas.
for (string_view key : keys_to_journal) {
if (auto journal = owner_->journal(); journal)
RecordExpiryBlocking(db_ind, key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment saying that even though you call RecordExpiryBlocking it does not block due to JournalFlushGuard above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find funny that Adi asked me why I used

  // Disable flush journal changes to prevent preemtion
  journal::JournalFlushGuard journal_flush_guard(shard_owner()->journal());

when the caller of this function already does the same. Then you are asking me to add a comment here.

yet the only reason I added the line above on the first line of this function is precisely to show intent (that we disable the flushing in journal explicitly).

I will add a comment 😄

size_t starting_segment_id,
size_t increase_goal_bytes) {
// Disable flush journal changes to prevent preemtion
journal::JournalFlushGuard journal_flush_guard(shard_owner()->journal());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller to this function also uses journal::JournalFlushGuard
In this class distructor we do journal_->SetFlushMode(true);
This means that the logic now is broken if there are other calls to journal not inside FreeMemWithEvictionStepAtomic

  1. I would add a dcheck in JournalFlushGuard to see we do not have nested calls to it.
  2. why did you add it here if its also in the caller?

++num_seg_visited, segment_id = GetNextSegmentForEviction(segment_id, db_ind)) {
const auto& bucket = db_table->prime.GetSegment(segment_id)->GetBucket(bucket_id);
if (bucket.IsEmpty() || !bucket.IsBusy(slot_id))
continue;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only difference is I combined those two statements

if ((evicted_items == max_eviction_per_hb) || (evicted_bytes >= increase_goal_bytes))
goto finish;
}
for (int32_t slot_id = num_slots - 1; slot_id >= 0; --slot_id) {
Copy link
Contributor Author

@kostasrim kostasrim Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No functional changes in this block of code. I moved the FiberAtomicGuard above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adiholden I can also move back the Guard locally so the diff is cleaner

@kostasrim kostasrim merged commit 291b262 into main May 2, 2025
10 checks passed
@kostasrim kostasrim deleted the kpr4 branch May 2, 2025 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preempt inside atomic section of heartbeat during evictions
4 participants