Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: support lazy appends #125002

Closed
wants to merge 3 commits into from
Closed

Conversation

pav-kv
Copy link
Collaborator

@pav-kv pav-kv commented Jun 3, 2024

This PR introduces the "lazy appends" API in raft RawNode. In this mode, all MsgApp messages with a non-empty Entries slice in StateReplicate are sent using the RawNode.SendAppend method. This gives the caller (typically, the one handling Ready processing) direct control of when the raft node accesses Storage and sends replication messages. The API will be used by Replication Admission Control, which ultimately helps solving follower overload issues.

Previously, all MsgApp messages would be sent eagerly as soon as the node sees that a follower is behind. Any message Step-ped into raft could cause an immediate read of entries from Storage and a message construction.

The new behaviour is disabled by default, and hidden by Config.EnableLazyAppends flag. In the future, it will be the default.

PR #124948 demonstrates the usage of this API, and its effect can be seen in the testdata traces of the data-driven tests: messages typically contain more entries, since their construction is delayed until Ready processing.

Epic: CRDB-37515

Copy link

blathers-crl bot commented Jun 3, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pav-kv pav-kv force-pushed the raft-lazy-appends branch from cec977d to 76c7ab6 Compare June 3, 2024 20:04
This commit introduces the "lazy appends" API in raft RawNode. In this
mode, all MsgApp messages with a non-empty Entries slice in
StateReplicate are sent using the RawNode.SendAppend method. This gives
the caller (typically, the one handling Ready processing) direct control
of when the raft node accesses Storage and sends replication messages.
The API will be used by Replication Admission Control, which ultimately
helps solving follower overload issues.

Previously, all MsgApp messages would be sent eagerly as soon as the
node sees that a follower is behind. Any message Step-ped into raft
could cause an immediate read of entries from Storage and a message
construction.

Epic: CRDB-37515
Release note: None
@pav-kv pav-kv force-pushed the raft-lazy-appends branch from 76c7ab6 to 1b1d399 Compare June 3, 2024 20:13
@pav-kv pav-kv marked this pull request as ready for review June 3, 2024 20:54
@pav-kv pav-kv requested review from a team and arulajmani June 3, 2024 20:54
@kvoli kvoli self-requested a review June 3, 2024 21:00
@pav-kv
Copy link
Collaborator Author

pav-kv commented Jun 3, 2024

TODO: add tests, though the demo in #124948 already provides good coverage.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused.
First one is perhaps a nit. The RawNode.SendAppend method is not defined in this PR. It seems to be in the other PR. Shouldn't it be defined here given the commit comment?

Second, is the following meant to be an intermediate step in the refactor?

// SendAppend sends a log replication (MsgApp) message to a particular node. The
// message will be available via next Ready.

Asking since this does not conform to the RaftInterface in the RACv2 prototype. In there we get a Ready with entries, and based on the sizes of the entries and availability of tokens pull within the same handleRaftReadyRaftMuLocked. Having the entries first is important since we need to figure out the priorities and size. And we also don't want to delay the sending until the next Ready. I guess we could call Ready a second time within handleRaftReadyRaftMuLocked but this again is slightly convoluted (akin to our discussion in https://cockroachlabs.slack.com/archives/C06UFBJ743F/p1717185326307039?thread_ts=1717076603.877549&cid=C06UFBJ743F, though not as convoluted) in that we shouldn't have to tell RawNode what we want, and then call it again with no parameter for it to give us what we want -- the first call should simply return what we want.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @kvoli)

Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RawNode.SendApped is in this PR, see the rawnode.go file. Not sure why you can't see it, maybe Reviewable is confusing things.

This is not the final version of the method, see TODOs in it. The main goal is to add the plumbing first, and then we will add the required parameters as we go, as well as the state tracking outside raft. As of today, there are no extra parameters needed to drive replication using this method (see the other demo PR). Also, this doesn't introduce additional code for handling messages: instead of returning MsgApp, all the messages are in Ready - this can be changed in the future but is not needed now for simplicity.

In the future, the approximate flow is:

  • The upper layer learns Next and entry sizes continuously (not necessarily through Ready). At any point in time (when we hold raftMu), we can access the raftLog / unstable and learn the last index, the Next index, and we know all the entry sizes in unstable (we remember them either when we append them, or we can also scan raftLog/unstable at any time if needed).
  • So, at any point in time we're able to call SendAppend if the tracking state indicates the stream is ready. Typically we will do it in handleRaftReadyRaftMuLocked, right before calling RawNode.Ready.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @kvoli)

@pav-kv
Copy link
Collaborator Author

pav-kv commented Oct 2, 2024

Superseded by #131588.

@pav-kv pav-kv closed this Oct 2, 2024
@pav-kv pav-kv deleted the raft-lazy-appends branch October 2, 2024 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants