Describe the bug
Data inconsistencies observed in Mnesia ram_copies table after an isolated node restart and reconnection.
no such problem for disc_copies.
To Reproduce
- Start three nodes
start node1
erl -sname node3@localhost -setcookie mysecretcookie -mnesia dir '"/tmp/mnesia/node3"' -s mnesia
start node2
erl -sname node3@localhost -setcookie mysecretcookie -mnesia dir '"/tmp/mnesia/node3"' -s mnesia
start node3
erl -sname node3@localhost -setcookie mysecretcookie -mnesia dir '"/tmp/mnesia/node3"' -s mnesia
- create schema and tables
Nodes = [node1@localhost, node2@localhost, node3@localhost],
io:format("3. Stopping Mnesia...~n"),
[rpc:call(N, mnesia, stop, []) || N <- Nodes],
timer:sleep(1000),
io:format("4. Creating schema...~n"),
io:format(" → ~p~n", [rpc:call(node1@localhost, mnesia, create_schema, [Nodes])]),
timer:sleep(1000),
io:format("5. Starting Mnesia...~n"),
[rpc:call(N, mnesia, start, []) || N <- Nodes],
timer:sleep(2000),
io:format("6. Creating tables...~n"),
Disc = [{disc_copies, Nodes}, {attributes, [key, val]}, {type, set}],
Ram = [{ram_copies, Nodes}, {attributes, [key, val]}, {type, set}],
io:format(" disc: ~p~n", [rpc:call(node1@localhost, mnesia, create_table, [test_abc_disc, Disc])]),
io:format(" ram : ~p~n", [rpc:call(node1@localhost, mnesia, create_table, [test_abc_ram, Ram])]),
io:format("7. Writing test data...~n"),
Write = fun() ->
mnesia:write({test_abc_disc, 1, "disc - persistent"}),
mnesia:write({test_abc_ram, 1, "ram - memory only"})
end,
io:format(" Write: ~p~n", [rpc:call(node1@localhost, mnesia, transaction, [Write])])
-
stop Node 3.
kill -9 $pid
-
signal STOP to other two nodes for pausing.
kill -STOP $pidOfNode1
kill -STOP $pidOfNode2
another method is to block new TCP connection to the dist port of both node 1 and node 2.
-
start Node 3.
-
inspect table info on Node3
for test_abc_disc , the where_to_read is nowhere
for test_abc_ram, the where_to_read is node3@localhost.
test_abc_disc is not readable/writable.
test_abc_ram is empty and it is readable with dirty read.
mnesia:wait_for_table/2 timeout for test_abc_disc but returns ok for test_abc_ram
if TCP connecion is blocked in step4, unblock it.
-
signal CONT to other two nodes to accept joining from Node3.
kill -CONT $pidOfNode1
kill -CONT $pidOfNode2
-
On node3, reconnect to the cluster.
mnesia_controller:connect_nodes(mnesia:system_info(db_nodes)),
- inspect table info on Node3
test_abc_disc data is in sync with other two nodes.
test_abc_ram data is not in-sync with other two nodes, it is empty.
Expected behavior
in step 7, Access to the test_abc_ram should not be served. wait_for_table should timeout.
in step 9, Node 3 should have exact copy as the other nodes for table test_abc_ram
The behaviour of disc_copies and ram_copies should be aligned.
Affected versions
OTP-28 but belive older one has same issue too.
Additional context
Add any other context about the problem here. If you wish to attach Erlang code you can either write it directly in the post using code tags, create a gist, or attach it as a zip file to this post.
Describe the bug
Data inconsistencies observed in Mnesia ram_copies table after an isolated node restart and reconnection.
no such problem for disc_copies.
To Reproduce
start node1
start node2
start node3
stop Node 3.
kill -9 $pid
signal STOP to other two nodes for pausing.
kill -STOP $pidOfNode1
kill -STOP $pidOfNode2
another method is to block new TCP connection to the dist port of both node 1 and node 2.
start Node 3.
inspect table info on Node3
for
test_abc_disc, the where_to_read isnowherefor
test_abc_ram, the where_to_read isnode3@localhost.test_abc_disc is not readable/writable.
test_abc_ram is empty and it is readable with dirty read.
mnesia:wait_for_table/2 timeout for
test_abc_discbut returns ok fortest_abc_ramif TCP connecion is blocked in step4, unblock it.
signal CONT to other two nodes to accept joining from Node3.
kill -CONT $pidOfNode1
kill -CONT $pidOfNode2
On node3, reconnect to the cluster.
test_abc_discdata is in sync with other two nodes.test_abc_ramdata is not in-sync with other two nodes, it is empty.Expected behavior
in step 7, Access to the
test_abc_ramshould not be served. wait_for_table should timeout.in step 9, Node 3 should have exact copy as the other nodes for table
test_abc_ramThe behaviour of disc_copies and ram_copies should be aligned.
Affected versions
OTP-28 but belive older one has same issue too.
Additional context
Add any other context about the problem here. If you wish to attach Erlang code you can either write it directly in the post using code tags, create a gist, or attach it as a zip file to this post.