Open
Description
While debugging an instance of #2416, I saw at gc08's /pool/ext/8a199f12-4f5c-483a-8aca-f97856658a35/crypt/debug/oxz_nexus_65a11c18-7f59-41ac-b9e7-680627f996e7/oxide-nexus:default.log.1711665000
:
thread 'tokio-runtime-worker' panicked at nexus/db-queries/src/db/sec_store.rs:65:60:
called `Result::unwrap()` on an `Err` value: InternalError { internal_message: "database error (kind = Unknown): result is ambiguous: error=rpc error: code = Unavailable desc = error reading from server: read tcp [fd00:1122:3344:109::3]:56722->[fd00:1122:3344:105::3]:32221: read: connection reset by peer [exhausted]\n" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[ Mar 28 22:01:58 Stopping because all processes in service exited. ]
[ Mar 28 22:01:58 Executing stop method (:kill). ]
In this case the issue is pretty clear, but I'm wondering if we've considered setting RUST_BACKTRACE=1
in our production environment. Having backtraces is something that can definitely aid in debugging, but maybe it isn't a big deal because the core file can show what's going on. (But see #5359.)
According to https://stackoverflow.com/questions/29421727/how-much-overhead-does-rust-backtrace-1-have it seems like there's some performance cost, so we'd have to measure it carefully.
Wonder if @hawkw has thoughts here.
### Tasks
- [ ] Create a small crate to set RUST_BACKTRACE=1 if it isn't set already (and maybe RUST_LIB_BACKTRACE as well)
- [ ] Use the crate in nexus
- [ ] Use the crate in sled-agent
- [ ] Use it in wicketd
- [ ] Use it elsewhere (add tasks for other services that would benefit)
Metadata
Metadata
Assignees
Labels
No labels