Database corruption on service restart #3416

UglyDuskling · 2025-01-23T20:44:32Z

I discovered today that my Nocturne node has fallen out of sync and it has not recovered automatically. It appears that a rusk service restart caused the issue. Restarting it again does not fix it.

The error logs are filled with this error:

2025-01-20T18:33:39.272419Z ERROR node::chain: event="fsm::on_event failed" src="wire" err=Cannot accept txs: Inconsistent state verification data VerificationOutput { state_root: f10f20056e4cc9c470d6f63d86ae115ebfe8a96fd19fa964965324df73e29186, event_bloom: 00000000000000000000000000000000000000000000000000000400000000000000400000000000000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000 }!!

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
   1: anyhow::__private::format_err
   2: rusk::node::vm::<impl node::vm::VMExecution for rusk::node::Rusk>::accept
   3: <node::database::rocksdb::Backend as node::database::DB>::update
   4: node::chain::acceptor::Acceptor<N,DB,VM>::try_accept_block::{{closure}}
   5: node::chain::fsm::insync::InSyncImpl<DB,VM,N>::on_block_event::{{closure}}
   6: node::chain::fsm::SimpleFSM<N,DB,VM>::on_block_event::{{closure}}
   7: <node::chain::ChainSrv<N,DB,VM> as node::LongLivedService<N,DB,VM>>::execute::{{closure}}
   8: node::Node<N,DB,VM>::spawn_all::{{closure}}::{{closure}}
   9: tokio::runtime::task::core::Core<T,S>::poll
  10: tokio::runtime::task::harness::Harness<T,S>::poll
  11: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  12: tokio::runtime::scheduler::multi_thread::worker::Context::run
  13: tokio::runtime::context::runtime::enter_runtime
  14: tokio::runtime::scheduler::multi_thread::worker::run
  15: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  16: tokio::runtime::task::core::Core<T,S>::poll
  17: tokio::runtime::task::harness::Harness<T,S>::poll
  18: tokio::runtime::blocking::pool::Inner::run
  19: std::sys_common::backtrace::__rust_begin_short_backtrace
  20: core::ops::function::FnOnce::call_once{{vtable.shim}}
  21: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/0f44eb32f1123ac93ab404d74c295263ce468343/library/alloc/src/boxed.rs:2007:9
  22: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/0f44eb32f1123ac93ab404d74c295263ce468343/library/alloc/src/boxed.rs:2007:9
  23: std::sys::unix::thread::Thread::new::thread_start
             at /rustc/0f44eb32f1123ac93ab404d74c295263ce468343/library/std/src/sys/unix/thread.rs:108:17
  24: <unknown>
  25: <unknown>

And loads of these:

2025-01-20T18:33:39.281331Z  WARN node::chain::acceptor: event="Consensus msg discarded" reason="too far in the future" topic=Ratification info=ConsensusHeader { prev_block_hash: "00c996948374bee1...7880b987095374a6", round: 242127, iteration: 0 } ray_id=""

Some timestamps:
Node started on 20th at 18:33:37

Using CONSENSUS_KEYS in /opt/dusk/conf/consensus.keys
2025-01-20T18:33:37.076153Z  INFO rusk: Using state from "/opt/dusk/rusk/state"

First error 2 seconds after at 18:33:39

2025-01-20T18:33:39.272419Z ERROR node::chain: event="fsm::on_event failed" src="wire" err=Cannot accept txs: Inconsistent state verification data VerificationOutput....

The service itself appears to have been gracefully restarted two minutes prior. It was not a machine reset. Running journalctl --since "2025-01-20" | grep rusk I get (timestamp is GMT+11, so they are equivalent to 20th 18:31:47) :

Jan 21 05:31:47 computername systemd[1]: Stopping rusk.service - DUSK Rusk...
Jan 21 05:31:47 computername systemd[1]: rusk.service: Deactivated successfully.
Jan 21 05:31:47 computername systemd[1]: Stopped rusk.service - DUSK Rusk.
Jan 21 05:31:47 computername systemd[1]: rusk.service: Consumed 18h 53min 40.012s CPU time, 900.0M memory peak, 0B memory swap peak.
Jan 21 05:32:04 computername systemd[1]: Configuration file /etc/systemd/system/rusk.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Jan 21 05:32:04 computername systemd[1]: Configuration file /etc/systemd/system/rusk.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
Jan 21 05:32:04 computername systemd[1]: Configuration file /etc/systemd/system/rusk.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Jan 21 05:32:04 computername systemd[1]: Configuration file /etc/systemd/system/rusk.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
Jan 21 05:32:04 computername systemd[1]: Configuration file /etc/systemd/system/rusk.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Jan 21 05:32:04 computername systemd[1]: Configuration file /etc/systemd/system/rusk.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
Jan 21 05:33:19 computername systemd[1]: Configuration file /etc/systemd/system/rusk.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Jan 21 05:33:19 computername systemd[1]: Configuration file /etc/systemd/system/rusk.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
Jan 21 05:33:19 computername systemd[1]: Starting rusk.service - DUSK Rusk...
Jan 21 05:33:37 computername systemd[1]: Started rusk.service - DUSK Rusk.

I don't expect state or database errors to happen from a graceful stop/start of the rusk service.

As of now I have preserved the logs and kept the state as-is with the rusk service stopped in case you need any of it.

The text was updated successfully, but these errors were encountered:

UglyDuskling added the fix:bug Something isn't working label Jan 23, 2025

UglyDuskling mentioned this issue Jan 30, 2025

Node Doesn't Sync On Testnet #3434

Open

UglyDuskling closed this as completed Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database corruption on service restart #3416

Database corruption on service restart #3416

UglyDuskling commented Jan 23, 2025

Database corruption on service restart #3416

Database corruption on service restart #3416

Comments

UglyDuskling commented Jan 23, 2025