Skip to content

mnesia: Data inconsistencies observed in Mnesia ram_copies table after an isolated node restart and reconnection. #11021

@qzhuyan

Description

@qzhuyan

Describe the bug
Data inconsistencies observed in Mnesia ram_copies table after an isolated node restart and reconnection.

no such problem for disc_copies.

To Reproduce

  1. Start three nodes
    start node1
erl -sname node3@localhost -setcookie mysecretcookie -mnesia dir '"/tmp/mnesia/node3"' -s mnesia

start node2

erl -sname node3@localhost -setcookie mysecretcookie -mnesia dir '"/tmp/mnesia/node3"' -s mnesia

start node3

erl -sname node3@localhost -setcookie mysecretcookie -mnesia dir '"/tmp/mnesia/node3"' -s mnesia
  1. create schema and tables
Nodes = [node1@localhost, node2@localhost, node3@localhost],
io:format("3. Stopping Mnesia...~n"),
[rpc:call(N, mnesia, stop, []) || N <- Nodes],
timer:sleep(1000),

io:format("4. Creating schema...~n"),
io:format("~p~n", [rpc:call(node1@localhost, mnesia, create_schema, [Nodes])]),
timer:sleep(1000),

io:format("5. Starting Mnesia...~n"),
[rpc:call(N, mnesia, start, []) || N <- Nodes],
timer:sleep(2000),

io:format("6. Creating tables...~n"),
Disc = [{disc_copies, Nodes}, {attributes, [key, val]}, {type, set}],
Ram  = [{ram_copies,  Nodes}, {attributes, [key, val]}, {type, set}],
io:format("   disc: ~p~n", [rpc:call(node1@localhost, mnesia, create_table, [test_abc_disc, Disc])]),
io:format("   ram : ~p~n", [rpc:call(node1@localhost, mnesia, create_table, [test_abc_ram,  Ram])]),

io:format("7. Writing test data...~n"),
Write = fun() ->
  mnesia:write({test_abc_disc, 1, "disc - persistent"}),
  mnesia:write({test_abc_ram,  1, "ram - memory only"})
end,
io:format("   Write: ~p~n", [rpc:call(node1@localhost, mnesia, transaction, [Write])])
  1. stop Node 3.
    kill -9 $pid

  2. signal STOP to other two nodes for pausing.
    kill -STOP $pidOfNode1
    kill -STOP $pidOfNode2

another method is to block new TCP connection to the dist port of both node 1 and node 2.

  1. start Node 3.

  2. inspect table info on Node3
    for test_abc_disc , the where_to_read is nowhere
    for test_abc_ram, the where_to_read is node3@localhost.
    test_abc_disc is not readable/writable.
    test_abc_ram is empty and it is readable with dirty read.
    mnesia:wait_for_table/2 timeout for test_abc_disc but returns ok for test_abc_ram

if TCP connecion is blocked in step4, unblock it.

  1. signal CONT to other two nodes to accept joining from Node3.
    kill -CONT $pidOfNode1
    kill -CONT $pidOfNode2

  2. On node3, reconnect to the cluster.

mnesia_controller:connect_nodes(mnesia:system_info(db_nodes)),
  1. inspect table info on Node3
    test_abc_disc data is in sync with other two nodes.
    test_abc_ram data is not in-sync with other two nodes, it is empty.

Expected behavior
in step 7, Access to the test_abc_ram should not be served. wait_for_table should timeout.
in step 9, Node 3 should have exact copy as the other nodes for table test_abc_ram
The behaviour of disc_copies and ram_copies should be aligned.

Affected versions
OTP-28 but belive older one has same issue too.

Additional context
Add any other context about the problem here. If you wish to attach Erlang code you can either write it directly in the post using code tags, create a gist, or attach it as a zip file to this post.

Metadata

Metadata

Assignees

Labels

bugIssue is reported as a bugteam:PSAssigned to OTP team PS

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions