Skip to content
This repository has been archived by the owner on Jul 12, 2021. It is now read-only.

Server doesn't accept new connections after a while #132

Open
bauerj opened this issue Oct 27, 2015 · 17 comments
Open

Server doesn't accept new connections after a while #132

bauerj opened this issue Oct 27, 2015 · 17 comments

Comments

@bauerj
Copy link
Contributor

bauerj commented Oct 27, 2015

This is the last SSL-connection message in the log:

[27/10/2015-09:36:47] SSL     XX.XXX.88.65:49605    1 2.4.4

It's Tue 27 Oct 21:28:38 CET 2015 right now which means the server didn't receive a new connection for 12 hours.

The server is still connected to irc and thus visible in the peer list.

When I try to connect from the client with SSL I get a "Not connected" message.

When I try to connect with TCP, the server accepts the connection but the client doesn't seem to sync.

electrum-server getinfo still works and reports 171 open sessions.

Connecting with openssl_client fails too:

root@bauerj ~ # openssl s_client -connect electrum.bauerj.eu:50002
connect: Connection refused
connect:errno=111

I can connect to the TCP port but I get no response:

telnet electrum.bauerj.eu 50001
Trying 192.198.88.125...
Connected to electrum.bauerj.eu.
Escape character is '^]'.
{"id": 0, "method": "server.version", "params": []}
{"id": 0, "method": "server.version", "params": []}

The server imports blocks as expected. Restarting makes the server accept connections again.

@ecdsa
Copy link
Member

ecdsa commented Oct 27, 2015

this probably means that the ssl thread died.
please try to find a traceback in your log and post it here.

@bauerj
Copy link
Contributor Author

bauerj commented Oct 27, 2015

The last traceback was 5 days ago (and unrelated).

@bauerj
Copy link
Contributor Author

bauerj commented Oct 27, 2015

[22/10/2015-15:20:34] irc
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/electrumserver/ircthread.py", line 137, in run
    client.process_forever()
  File "build/bdist.linux-x86_64/egg/irc/client.py", line 278, in process_forever
    self.process_once(timeout)
  File "build/bdist.linux-x86_64/egg/irc/client.py", line 259, in process_once
    self.process_data(i)
  File "build/bdist.linux-x86_64/egg/irc/client.py", line 216, in process_data
    c.process_data()
  File "build/bdist.linux-x86_64/egg/irc/client.py", line 573, in process_data

@bauerj
Copy link
Contributor Author

bauerj commented Nov 4, 2015

electrum.bauerj.eu shows this behaviour again. Is there anything I can do to debug this?

I'm not restarting it for now.

@EagleTM
Copy link

EagleTM commented Nov 9, 2015

How recent is the codebase you're running and how much RAM do you have in the server and free to use for electrum server?
It might be related to #126 just with a different symptom

@ecdsa
Copy link
Member

ecdsa commented Nov 9, 2015

@EagleTM no, it's different. his ssl thread died.

@EagleTM
Copy link

EagleTM commented Dec 14, 2015

However I have at least one other server operator who runs with 16 gig and 512 MB swap and runs into this issue. I think it's one possible side effect of memory starvation. The others being std::bad_alloc crash or OOM killer

@EagleTM
Copy link

EagleTM commented Dec 26, 2015

OK it's independent of the RAM thing after all. It seems under some circumstances the ssl thread dies - pretty early after starting up it seems - while tcp and irc are still working. No idea how to debug for the time being. I'm suspecting openssl is acting up (some security bugfix possbily)

@EagleTM
Copy link

EagleTM commented Jan 3, 2016

Ignore my last comment. In this particular case it was an expired SSL certificate.

@danny91
Copy link

danny91 commented Jan 12, 2016

Having the same issue. TCP or SSL stops accepting connections. SSL happened last night and TCP tonight. Checked traceback and no entries about anything stopping.

@luggs-co
Copy link

I always seem to notice this issue, though personally I have looked in awhile, has there been a patch for this yet?

@valesi
Copy link
Contributor

valesi commented Feb 4, 2016

Today I found my TCP unresponsive (although it appears to have been dead for about 23 hours). Here's what I've found so far. "command" will be short for run_electrum_server.py debug "command". transports[0] is my TcpServer thread.

"transports[0]" shows the TcpServer thread is stopped/dead. Server wasn't listening on 50001 at all. Trying to start the thread with "transports[0].run()" failed: File "build/bdist.linux-x86_64/egg/electrumserver/stratum_tcp.py", line 220, in run | IOError: [Errno 2] No such file or directory. Not sure what that's about.

electrum-server sessions | grep ^TCP still shows old sessions, that netstat shows as CLOSE_WAIT. It seems the dead thread doesn't release resources.

"transports.remove(transports[0])" and "transports.append(TcpServer(dispatcher, 'btc.smsys.me', 50001, False, None, None))" recreated the thread. "transports[1].start()" got it to accept TCP sessions again.

I'll try to make some time to debug this. I don't quite understand a bit of what's going on in stratum_tcp.py yet.

@shsmith
Copy link
Contributor

shsmith commented Feb 4, 2016

I have seen the same problem.
There is an unhandled exception that is output to the console but not the log files.
Here is an example:

Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "build/bdist.linux-x86_64/egg/electrumserver/stratum_tcp.py", line 287, in run
    stop_session(fd)
  File "build/bdist.linux-x86_64/egg/electrumserver/stratum_tcp.py", line 164, in stop_session
    session.stop()
  File "build/bdist.linux-x86_64/egg/electrumserver/processor.py", line 237, in stop
    self.dispatcher.remove_session(self)
  File "build/bdist.linux-x86_64/egg/electrumserver/processor.py", line 189, in remove_session
    del self.sessions[key]
KeyError: 'x.x.x.x:51335'

Nothing catches this exception so the whole thread dies.

@ecdsa
Copy link
Member

ecdsa commented Feb 4, 2016

@shsmith good catch, I will have a look

@ecdsa
Copy link
Member

ecdsa commented Feb 4, 2016

that's fixed in 7949a93

@shsmith
Copy link
Contributor

shsmith commented Feb 9, 2016

Found another unhandled exception that kills a thread:

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 79, in do_catch_up
    self.header = self.block2header(self.bitcoind('getblock', (self.storage.last_hash,)))
  File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 146, in bitcoind
    raise BaseException(r['error'])
BaseException: {u'message': u"Can't read block from disk", u'code': -32603}

Same as before, this one did not appear in the log but was sent to the console output.

@shsmith
Copy link
Contributor

shsmith commented Jul 6, 2016

Yet another unhandled exception that kills the TCP thread:

Exception in thread Thread-8:

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "build/bdist.linux-x86_64/egg/electrumserver/stratum_tcp.py", line 243, in run
    poller.modify(session.raw_connection, mode)
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib/python2.7/socket.py", line 170, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants