Skip to content

Commit 2f10997

Browse files
authored
Merge pull request #137 from pav-kv/lagging-commit-test
testdata: add a lagging commit test
2 parents d14e61b + 8a26275 commit 2f10997

File tree

1 file changed

+174
-0
lines changed

1 file changed

+174
-0
lines changed

testdata/lagging_commit.txt

+174
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# This test demonstrates the effect of delayed commit on a follower node after a
2+
# network hiccup between the leader and this follower.
3+
4+
# Skip logging the boilerplate. Set up a raft group of 3 nodes, and elect node 1
5+
# as the leader. Nodes 2 and 3 are the followers.
6+
log-level none
7+
----
8+
ok
9+
10+
add-nodes 3 voters=(1,2,3) index=10
11+
----
12+
ok
13+
14+
campaign 1
15+
----
16+
ok
17+
18+
stabilize
19+
----
20+
ok
21+
22+
# Propose a couple of entries.
23+
propose 1 data1
24+
----
25+
ok
26+
27+
propose 1 data2
28+
----
29+
ok
30+
31+
process-ready 1
32+
----
33+
ok
34+
35+
# The interesting part starts below.
36+
log-level debug
37+
----
38+
ok
39+
40+
deliver-msgs 2 3
41+
----
42+
1->2 MsgApp Term:1 Log:1/11 Commit:11 Entries:[1/12 EntryNormal "data1"]
43+
1->2 MsgApp Term:1 Log:1/12 Commit:11 Entries:[1/13 EntryNormal "data2"]
44+
1->3 MsgApp Term:1 Log:1/11 Commit:11 Entries:[1/12 EntryNormal "data1"]
45+
1->3 MsgApp Term:1 Log:1/12 Commit:11 Entries:[1/13 EntryNormal "data2"]
46+
47+
process-ready 3
48+
----
49+
Ready MustSync=true:
50+
Entries:
51+
1/12 EntryNormal "data1"
52+
1/13 EntryNormal "data2"
53+
Messages:
54+
3->1 MsgAppResp Term:1 Log:0/12
55+
3->1 MsgAppResp Term:1 Log:0/13
56+
57+
# Suppose there is a network blip which prevents the leader learning that the
58+
# follower 3 has appended the proposed entries to the log.
59+
deliver-msgs drop=(1)
60+
----
61+
dropped: 3->1 MsgAppResp Term:1 Log:0/12
62+
dropped: 3->1 MsgAppResp Term:1 Log:0/13
63+
64+
# In the meantime, the entries are committed, and the leader sends the commit
65+
# index to all the followers.
66+
stabilize 1 2
67+
----
68+
> 2 handling Ready
69+
Ready MustSync=true:
70+
Entries:
71+
1/12 EntryNormal "data1"
72+
1/13 EntryNormal "data2"
73+
Messages:
74+
2->1 MsgAppResp Term:1 Log:0/12
75+
2->1 MsgAppResp Term:1 Log:0/13
76+
> 1 receiving messages
77+
2->1 MsgAppResp Term:1 Log:0/12
78+
2->1 MsgAppResp Term:1 Log:0/13
79+
> 1 handling Ready
80+
Ready MustSync=false:
81+
HardState Term:1 Vote:1 Commit:13
82+
CommittedEntries:
83+
1/12 EntryNormal "data1"
84+
1/13 EntryNormal "data2"
85+
Messages:
86+
1->2 MsgApp Term:1 Log:1/13 Commit:12
87+
1->3 MsgApp Term:1 Log:1/13 Commit:12
88+
1->2 MsgApp Term:1 Log:1/13 Commit:13
89+
1->3 MsgApp Term:1 Log:1/13 Commit:13
90+
> 2 receiving messages
91+
1->2 MsgApp Term:1 Log:1/13 Commit:12
92+
1->2 MsgApp Term:1 Log:1/13 Commit:13
93+
> 2 handling Ready
94+
Ready MustSync=false:
95+
HardState Term:1 Vote:1 Commit:13
96+
CommittedEntries:
97+
1/12 EntryNormal "data1"
98+
1/13 EntryNormal "data2"
99+
Messages:
100+
2->1 MsgAppResp Term:1 Log:0/13
101+
2->1 MsgAppResp Term:1 Log:0/13
102+
> 1 receiving messages
103+
2->1 MsgAppResp Term:1 Log:0/13
104+
2->1 MsgAppResp Term:1 Log:0/13
105+
106+
# The network blip prevents the follower 3 from learning that the previously
107+
# appended entries are now committed.
108+
deliver-msgs drop=(3)
109+
----
110+
dropped: 1->3 MsgApp Term:1 Log:1/13 Commit:12
111+
dropped: 1->3 MsgApp Term:1 Log:1/13 Commit:13
112+
113+
# The network blip ends here.
114+
115+
status 1
116+
----
117+
1: StateReplicate match=13 next=14
118+
2: StateReplicate match=13 next=14
119+
3: StateReplicate match=11 next=14 inflight=2
120+
121+
# The leader still observes that the entries are in-flight to the follower 3,
122+
# since it hasn't heard from it. Nothing triggers updating the follower's
123+
# commit index, so we have to wait up to the full heartbeat interval before
124+
# the leader sends the commit index.
125+
tick-heartbeat 1
126+
----
127+
ok
128+
129+
# However, the leader does not push the real commit index to the follower 3. It
130+
# cuts the commit index at the Progress.Match mark, because it thinks that it is
131+
# unsafe to send a commit index higher than that.
132+
process-ready 1
133+
----
134+
Ready MustSync=false:
135+
Messages:
136+
1->2 MsgHeartbeat Term:1 Log:0/0 Commit:13
137+
1->3 MsgHeartbeat Term:1 Log:0/0 Commit:11
138+
139+
# Since the heartbeat message does not bump the follower's commit index, it will
140+
# take another roundtrip with the leader to update it. As such, the total time
141+
# it takes for the follower to learn the commit index is:
142+
#
143+
# delay = HeartbeatInterval + 3/2 * RTT
144+
#
145+
# This is suboptimal. It could have taken HeartbeatInterval + 1/2 * RTT, if the
146+
# leader sent the up-to-date commit index in the heartbeat message.
147+
#
148+
# See https://github.com/etcd-io/raft/issues/138 which aims to fix this.
149+
stabilize 1 3
150+
----
151+
> 3 receiving messages
152+
1->3 MsgHeartbeat Term:1 Log:0/0 Commit:11
153+
> 3 handling Ready
154+
Ready MustSync=false:
155+
Messages:
156+
3->1 MsgHeartbeatResp Term:1 Log:0/0
157+
> 1 receiving messages
158+
3->1 MsgHeartbeatResp Term:1 Log:0/0
159+
> 1 handling Ready
160+
Ready MustSync=false:
161+
Messages:
162+
1->3 MsgApp Term:1 Log:1/13 Commit:13
163+
> 3 receiving messages
164+
1->3 MsgApp Term:1 Log:1/13 Commit:13
165+
> 3 handling Ready
166+
Ready MustSync=false:
167+
HardState Term:1 Vote:1 Commit:13
168+
CommittedEntries:
169+
1/12 EntryNormal "data1"
170+
1/13 EntryNormal "data2"
171+
Messages:
172+
3->1 MsgAppResp Term:1 Log:0/13
173+
> 1 receiving messages
174+
3->1 MsgAppResp Term:1 Log:0/13

0 commit comments

Comments
 (0)