Skip to content

Commit b5a6fef

Browse files
daverigbychiyoung
authored andcommitted
Fix potenial deadlock around Connmap::relaseLock / connLock
As reported by ThreadSanitizer (see below), we have a lock inversion creating a potential deadlock in Connmap, related to how we shutdown connections: There exists a cycle in lock order graph: M2176 (Connmap::releaseLock in ConnMap::notifyPausedConnection() connmap.cc:235) => M128093 (mock_server::conn_struct::mutex) => M2177 (Connmap::connsLock in TapConnMap::shutdownAllConnections(), connmap.cc:770) => M2176 (Connmap::releaseLock in TapConnMap::shutdownAllConnections(), connmap.cc:777) DEADLOCK! The problem appears to be that in TapConnMap::shutdownAllConnections() we first acquire {connsLock}, then acquire {releaseLock}; all while holding the cookie lock. Fix is to drop releaseLock in shutdownAllConnections() once we've released any references, *then* acquire the connLock to actually clear out the connection map. ThreadSanitizer report follows (irrelevent parts removed): WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=1087) Cycle in lock order graph: M2176 (0x7d840001b810) => M128093 (0x7d280000efa0) => M2177 (0x7d840001b850) => M2176 Mutex M128093 acquired here while holding mutex M2176 in thread T10: <cut> Mutex M2176 previously acquired by the same thread here: #0 pthread_mutex_lock <null> () ... couchbase#5 ConnMap::notifyPausedConnection() ep-engine/src/connmap.cc:235 () <cut> Mutex M2177 acquired here while holding mutex M128093 in main thread: #0 pthread_mutex_lock <null> () ... couchbase#5 TapConnMap::newProducer() ep-engine/src/connmap.cc:378 () #6 EventuallyPersistentEngine::createTapQueue() ep-engine/src/ep_engine.cc:2663:23 () <cut> Mutex M128093 previously acquired by the same thread here: #0 pthread_mutex_lock <null> () couchbase#1 cb_mutex_enter platform/src/cb_pthreads.c:115:14 () couchbase#2 lock_mock_cookie memcached/programs/engine_testapp/mock_server.c:436:4 () couchbase#3 test_tap_stream() ep-engine/tests/ep_testsuite.cc:6751:5 () couchbase#4 execute_test() memcached/programs/engine_testapp/engine_testapp.cc:1090:19 () couchbase#5 main memcached/programs/engine_testapp/engine_testapp.cc:1439 () Mutex M2176 acquired here while holding mutex M2177 in main thread: #0 pthread_mutex_lock <null> () ... couchbase#5 TapConnMap::shutdownAllConnections() ep-engine/src/connmap.cc:777 () #6 EventuallyPersistentEngine::destroy() ep-engine/src/ep_engine.cc:2190:9 () <cut> Mutex M2177 previously acquired by the same thread here: #0 pthread_mutex_lock <null> () ... couchbase#5 TapConnMap::shutdownAllConnections() ep-engine/src/connmap.cc:770 () #6 EventuallyPersistentEngine::destroy() ep-engine/src/ep_engine.cc:2190:9 () <cut> Change-Id: Ic9a4f028b202277729df025333ce630be056903d Reviewed-on: http://review.couchbase.org/55863 Tested-by: buildbot <[email protected]> Reviewed-by: Chiyoung Seo <[email protected]>
1 parent 052a2cb commit b5a6fef

File tree

2 files changed

+19
-12
lines changed

2 files changed

+19
-12
lines changed

src/connmap.cc

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -767,13 +767,10 @@ void TapConnMap::shutdownAllConnections() {
767767

768768
connNotifier_->stop();
769769

770-
LockHolder lh(connsLock);
771-
// We should pause unless we purged some connections or
772-
// all queues have items.
773-
if (all.empty()) {
774-
return;
775-
}
776-
770+
// Not safe to acquire both connsLock and releaseLock at the same time
771+
// (can trigger deadlock), so first acquire releaseLock to release all
772+
// the connections (without changing the list/map), then drop releaseLock,
773+
// acquire connsLock and actually clear out the list/map.
777774
LockHolder rlh(releaseLock);
778775
std::list<connection_t>::iterator ii;
779776
for (ii = all.begin(); ii != all.end(); ++ii) {
@@ -786,6 +783,7 @@ void TapConnMap::shutdownAllConnections() {
786783
}
787784
rlh.unlock();
788785

786+
LockHolder lh(connsLock);
789787
all.clear();
790788
map_.clear();
791789
}
@@ -1038,11 +1036,10 @@ void DcpConnMap::shutdownAllConnections() {
10381036

10391037
connNotifier_->stop();
10401038

1041-
LockHolder lh(connsLock);
1042-
if (all.empty()) {
1043-
return;
1044-
}
1045-
1039+
// Not safe to acquire both connsLock and releaseLock at the same time
1040+
// (can trigger deadlock), so first acquire releaseLock to release all
1041+
// the connections (without changing the list/map), then drop releaseLock,
1042+
// acquire connsLock and actually clear out the list/map.
10461043
LockHolder rlh(releaseLock);
10471044
std::list<connection_t>::iterator ii;
10481045
for (ii = all.begin(); ii != all.end(); ++ii) {
@@ -1051,6 +1048,7 @@ void DcpConnMap::shutdownAllConnections() {
10511048
}
10521049
rlh.unlock();
10531050

1051+
LockHolder lh(connsLock);
10541052
closeAllStreams_UNLOCKED();
10551053
cancelAllTasks_UNLOCKED();
10561054
all.clear();

src/connmap.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,8 +207,17 @@ class ConnMap {
207207

208208
connection_t findByName_UNLOCKED(const std::string &name);
209209

210+
// Synchronises notifying and releasing connections.
211+
// Guards modifications to connection_t objects in {map_} / {all}.
212+
// See also: {connLock}
210213
Mutex releaseLock;
214+
215+
// Synchonises access to the {map_} and {all} members, i.e. adding
216+
// removing connections.
217+
// Actual modification of the underlying
218+
// ConnHandler objects is guarded by {releaseLock}.
211219
Mutex connsLock;
220+
212221
std::map<const void*, connection_t> map_;
213222
std::list<connection_t> all;
214223

0 commit comments

Comments
 (0)