Obtain Bitcask write locks using system call `flock/2` #257

andytill · 2018-02-12T10:36:02Z

Previous to this change, the OS pid was used to lock the bitcask database and prevent two bitcask instances performing write operations at the same time. If the pid in the lock file is currently taken by an OS process then the bitcask DB may not be written to. Bitcask is not able to clean up the write locks because of a hard shutdown it may not be able to delete the "stale" lock file and obtain a new one for the following reasons:

Another process takes the pid in the lock file
Riak takes the pid in the lock file

This has been raised in issues #72, #226 and basho/riak#535.

This change uses the flock system call to obtain exclusive write access to the file across OS processes, as long as the other processes use flock
to obtain the lock.

The change maintains the previous behaviour where the DB can be opened even if a write lock cannot be obtained, but returns an error on write operations.

The O_EXCL flag has been removed from the call to open in the nifs because if the lock file still exists, it returns the eexist error but now the proof of the write lock is not the file existing but the call to flock.

…le creation Previous to this change, the OS pid was used to lock the bitcask database and prevent two bitcask instances performing write operations at the same time. However, if bitcask is not able to clean up the write locks because of a hard shutdown and another process takes that the pid in written to the lock file, bitcask is not able to take the lock even though it is the only running instance. This change uses the flock system call to obtain exclusive write access to the file across OS processes, as long as the other processes use flock to obtain the lock.

russelldb · 2018-03-06T08:41:23Z

Is this something for which it is even possible to create a Riak test for?

russelldb

Code is clean. The tests all pass for me. Raised a question about copyright, but that's not something I personally care about either way, just be aware it may spawn discussion.

I don't have the expertise to say whether this is the right approach to the problem from a c/systems level, though the tests certainly show that it works.

I'd feel better with a more competent c/systems eye on it before it is merged (@tburghart, @Vagabond, (@mrallen1, I notice you HEART this PR) ANY OTHER?)

+1 from me, for what it is worth

russelldb · 2018-03-06T09:00:26Z

c_src/bitcask_nifs.c

@@ -3,6 +3,7 @@
 // bitcask: Eric Brewer-inspired key/value store
 //
 // Copyright (c) 2010 Basho Technologies, Inc. All Rights Reserved.
+// Copyright (c) 2018 Workday, Inc.


Is it necessary? We had a chat about this recently (I'll copy you in via mail.)

russelldb · 2018-03-06T09:13:56Z

Oh yeah, and if you do merge it, please close the related issues.

ghost · 2018-03-06T10:29:46Z

Is there a Riak top level project branch somewhere which is already configured to use your branch @andytill ? The code looks +1 but , just to be sure, I would like to run it against some data if possible.

andytill · 2018-03-06T10:44:59Z

Sorry no, it's all linked in our internal riak_ee.

Is there a Riak top level project branch somewhere which is already configured to use your branch @andytill ? The code looks +1 but , just to be sure, I would like to run it against some data if possible.

Licenser · 2018-03-06T10:48:12Z

Might it be worth considering fnctl instead of flock for portabilities sake?

https://www.perkin.org.uk/posts/solaris-portability-flock.html

is a good summary of the topic.

russelldb · 2018-03-06T10:57:19Z

Thanks @Licenser. You linked to https://github.com/basho/leveldb/blob/44cb7cbc85590280c9a73856470d5880f4015927/util/env_posix.cc#L493 in slack, I'm just commenting here to bring that link into this discussion

andytill · 2018-03-06T10:58:37Z

Very difficult I think to reproduce the OS pid on the restarted Riak to provoke the original bug. With the change it is possible to trigger the locked error by starting starting Riak, perform one write, starting a second Riak and performing a write with that which will return an error.

Is this something for which it is even possible to create a Riak test for?

Licenser · 2018-03-06T10:59:45Z

Oh yes, sorry @russelldb should have added that too.

russelldb · 2018-03-06T11:02:00Z

I’m happy without a riak_test, if there is no test to show the fault pre-fix.

…

On 6 Mar 2018, at 10:58, Andy Till ***@***.***> wrote: Very difficult I think to reproduce the OS pid on the restarted Riak to provoke the original bug. With the change it is possible to trigger the locked error by starting starting Riak, perform one write, starting a second Riak and performing a write with that which will return an error. Is this something for which it is even possible to create a Riak test for? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Vagabond · 2018-03-06T15:52:39Z

I'd definitely echo the suggestion to look at fnctl, the link Heinz posted even has some compatibility shims in addition to the eleveldb one.

ghost · 2018-03-06T16:28:03Z

On the CI server (my laptop) the quickcheck tests fail - output attached.

bitcast-test-failure.txt

It would be brilliant if we could get the tests to pass - then +1 from me.

With the current implementation I caused data loss (last year) when opening a bitcask data directory twice, despite the the second open was flagged as read_only (via a second beam) any attention to this area feels useful.

Imho - fnctl is only a consideration for Solaris users - @heinz is not using Bitcask on Solaris, which means there are zero stakeholders I know of for whom this would cause direct inconvenience.

ghost · 2018-03-06T16:35:32Z

wrt above comment - emfile - I've increased file descriptors and rerun the quickcheck tests.

russelldb · 2018-03-06T16:35:40Z

@bryanhuntesl that failure is emfile, are you sure you have your ulimit high enough?

ghost · 2018-03-06T16:43:00Z

Increased emfile limit to 10000 tests still failing - output attached.

bitcask-tests-failure.txt

russelldb · 2018-03-06T16:44:44Z

@bryanhuntesl can you add some info (erl + eqc version, platform) I ran this 10 times today without failure, on my macbook

ghost · 2018-03-06T16:48:17Z

Certainly - Macbook - i7/16GB

Erlang R16B02-basho1 (erts-5.10.3) [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

▷  echo $ERL_LIBS (re-formatted)
:/Users/bryanhunt/quickcheck-1.36.1:
/Users/bryanhunt/quickcheck-1.36.1/eqc-1.36.1:
/Users/bryanhunt/quickcheck-1.36.1/pulse-1.36.1:
/Users/bryanhunt/quickcheck-1.36.1/pulse_otp-1.36.1

russelldb · 2018-03-06T16:50:15Z

Not gonna moan about basho1 (should be basho10). Interesting. All the same as me. Ah well. Good luck someone with that one!

ghost · 2018-03-06T17:01:55Z

Just to confirm - I am running 16b02-basho10 - seems like it's been truncated in the erl console output.

Licenser · 2018-03-07T00:34:28Z

I am using bitcask on SmartOS. But that aside I see little reason not to use a POSIX compliant function when there is one.

ghost · 2018-03-07T09:06:24Z

Sorry, my mistake, I thought you were doing everything with raw files/Postgres.

Licenser · 2018-03-07T09:50:24Z

So many systems ;) in dalmatinerDB yes, it's files/postgres/(leveldb sadly that's tied to _core), snarl my AAA system uses bitcask for the timeouts. I'm happy to make a pullrequest on the pullrequest to move it to fnctl, it seems just to be once place that needs changes.

ghost · 2018-03-07T09:51:33Z

Thanks for clarification @Licenser

andytill · 2018-03-07T12:24:04Z

@Licenser thanks for offering to make this Solaris compatible, I'd welcome this change to the PR.

Licenser · 2018-03-08T04:50:12Z

The lock test does seem to fail on OSX (unmodified code):

    bitcask_lockops_tests: lock_cannot_be_obtained_on_already_locked_file_within_same_os_process_test...*failed*
in function bitcask_lockops_tests:'-lock_cannot_be_obtained_on_already_locked_file_within_same_os_process_test/0-fun-0-'/1 (test/bitcask_lockops_tests.erl, line 34)
**error:{assertEqual,[{module,bitcask_lockops_tests},
              {line,34},
              {expression,"bitcask_lockops : acquire ( write , Dir )"},
              {expected,{error,locked}},
              {value,{ok,<<>>}}]}
  output:<<"">>

update 1

Odd this did work on the second run, not sure as to why.

update 2

The eqc tests segfault :(

  bitcask_nifs: keydir_get_put_test_...........................................................................................make: *** [eunit_nif] Segmentation fault: 11

update 3

the calls tack would be:

Process 22623 stopped
* thread #16, name = '2_scheduler', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000000147dbea8 beam.smp`enif_send(env=0x00000000b06ba868, to_pid=<unavailable>, msg_env=0x0000000015f40818, msg=566859) at erl_nif.c:324 [opt]
   321
   322 	    if (env != NULL) {
   323 		c_p = env->proc;
-> 324 		if (receiver == c_p->common.id) {
   325 		    rp_locks = ERTS_PROC_LOCK_MAIN;
   326 		    flush_me = 1;
   327 		}
Target 0: (beam.smp) stopped.
(lldb) bt
* thread #16, name = '2_scheduler', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000000147dbea8 beam.smp`enif_send(env=0x00000000b06ba868, to_pid=<unavailable>, msg_env=0x0000000015f40818, msg=566859) at erl_nif.c:324 [opt]
    frame #1: 0x0000000014aea28a bitcask.so`msg_pending_awaken(env=0x00000000b06ba868, keydir=0x00007fa0dbf0ba10, msg=566859) at bitcask_nifs.c:2752
    frame #2: 0x0000000014ae5c4f bitcask.so`merge_pending_entries(env=0x00000000b06ba868, keydir=0x00007fa0dbf0ba10) at bitcask_nifs.c:2826
    frame #3: 0x0000000014ae5abd bitcask.so`itr_release_internal(env=0x00000000b06ba868, handle=0x0000000015ac1850) at bitcask_nifs.c:1995
    frame #4: 0x0000000014ae6449 bitcask.so`bitcask_nifs_keydir_resource_cleanup(env=0x00000000b06ba868, arg=0x0000000015ac1850) at bitcask_nifs.c:2920
    frame #5: 0x00000000147de2d2 beam.smp`nif_resource_dtor(bin=<unavailable>) at erl_nif.c:1420 [opt]
    frame #6: 0x00000000147b2173 beam.smp`sweep_off_heap [inlined] erts_bin_free(bp=0x0000000015ac1828) at erl_binary.h:331 [opt]
    frame #7: 0x00000000147b2165 beam.smp`sweep_off_heap(p=0x00000000161824b0, fullsweep=0) at erl_gc.c:2371 [opt]
    frame #8: 0x00000000147b04ec beam.smp`erts_garbage_collect [inlined] do_minor(p=<unavailable>, new_sz=<unavailable>, objv=<unavailable>, nobj=<unavailable>) at erl_gc.c:1180 [opt]
    frame #9: 0x00000000147af16f beam.smp`erts_garbage_collect at erl_gc.c:880 [opt]
    frame #10: 0x00000000147aeef2 beam.smp`erts_garbage_collect(p=<unavailable>, need=<unavailable>, objv=0x00000000b06bac78, nobj=1) at erl_gc.c:452 [opt]
    frame #11: 0x00000000147ada07 beam.smp`erts_gc_after_bif_call(p=0x00000000161824b0, result=<unavailable>, regs=<unavailable>, arity=<unavailable>) at erl_gc.c:373 [opt]
    frame #12: 0x00000000146b3418 beam.smp`process_main at beam_emu.c:2881 [opt]
    frame #13: 0x00000000147450a1 beam.smp`sched_thread_func(vesdp=0x00000000151782c0) at erl_process.c:8118 [opt]
    frame #14: 0x000000001486ea92 beam.smp`thr_wrapper(vtwd=0x00007ffeeb55ae50) at ethread.c:114 [opt]
    frame #15: 0x00007fff717ab6c1 libsystem_pthread.dylib`_pthread_body + 340
    frame #16: 0x00007fff717ab56d libsystem_pthread.dylib`_pthread_start + 377
    frame #17: 0x00007fff717aac5d libsystem_pthread.dylib`thread_start + 13

andytill · 2018-04-23T09:29:34Z

Still waiting to get my eqc license approved by work, then I'll try to reproduce.

llelf · 2018-12-29T16:00:21Z

Withfcntl(…, F_SETLK, …) (the only thing POSIX has), any close would lead to the process losing all its locks to that file.

russelldb approved these changes Mar 6, 2018

View reviewed changes

martincox deleted the branch basho:2.0 January 26, 2022 08:32

martincox closed this Jan 26, 2022

Obtain Bitcask write locks using system call flock/2 #257

Obtain Bitcask write locks using system call flock/2 #257

Uh oh!

Conversation

andytill commented Feb 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

russelldb commented Mar 6, 2018

Uh oh!

russelldb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

russelldb Mar 6, 2018

Choose a reason for hiding this comment

Uh oh!

russelldb commented Mar 6, 2018

Uh oh!

ghost commented Mar 6, 2018

Uh oh!

andytill commented Mar 6, 2018

Uh oh!

Licenser commented Mar 6, 2018

Uh oh!

russelldb commented Mar 6, 2018

Uh oh!

andytill commented Mar 6, 2018

Uh oh!

Licenser commented Mar 6, 2018

Uh oh!

russelldb commented Mar 6, 2018 via email

Uh oh!

Vagabond commented Mar 6, 2018

Uh oh!

ghost commented Mar 6, 2018

Uh oh!

ghost commented Mar 6, 2018

Uh oh!

russelldb commented Mar 6, 2018

Uh oh!

ghost commented Mar 6, 2018

Uh oh!

russelldb commented Mar 6, 2018

Uh oh!

ghost commented Mar 6, 2018

Uh oh!

russelldb commented Mar 6, 2018

Uh oh!

ghost commented Mar 6, 2018

Uh oh!

Licenser commented Mar 7, 2018

Uh oh!

ghost commented Mar 7, 2018

Uh oh!

Licenser commented Mar 7, 2018

Uh oh!

ghost commented Mar 7, 2018

Uh oh!

andytill commented Mar 7, 2018

Uh oh!

Licenser commented Mar 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

update 1

update 2

update 3

Uh oh!

andytill commented Apr 23, 2018

Uh oh!

llelf commented Dec 29, 2018

Uh oh!

Uh oh!

Obtain Bitcask write locks using system call `flock/2` #257

Obtain Bitcask write locks using system call `flock/2` #257

andytill commented Feb 12, 2018 •

edited

Loading

russelldb left a comment •

edited

Loading

Licenser commented Mar 8, 2018 •

edited

Loading