Skip to content

Latest commit

 

History

History
96 lines (55 loc) · 5.82 KB

MetricsLogging.md

File metadata and controls

96 lines (55 loc) · 5.82 KB

Metrics And Logging

⚠️ NOTE: this document serves as a starting point for debugging and does not provide an exhaustive/definitive answer

The relay exports metrics and chain-specific errors. This document identifies common metrics/logs and potential reasons for behavior.

Error Logging

failed to enqeue tx for simulation

  • indicates slow RPCs that are not responding quickly enough

original signature does not match retry signature

  • this could indicate a race condition within the relayer code (please alert developers for investigation)

failed to find transaction within confirm timeout

  • indicates network congestion or poor RPC performance (tx dropped)

simulate: unrecognized error

failed to enqeue tx

  • indicates slow RPC which does not respond quickly enough to keep up with the incoming stream of transactions

error in ReadAnswer: stale answer data, polling is likely experiencing errors

  • indicates RPC issues (most likely down)

error in ReadState: stale state data, polling is likely experiencing errors

  • indicates RPC issues (most likely down)

Metrics

solana_balance

  • provides the SOL balance for keys in the keystore
  • low SOL balance will lead to the CL node stop transmitting

solana_cache_last_update_unix

  • tracks last update to cached data (unix timestamp)
  • updates should occur at the configured rate (default: 1s), slower updates can indicate RPC latency issues

solana_client_latency_ms

  • tracks duration of each RPC request, separated via label + URLs
  • spikes in latency can indicate RPC issues

solana_txm_tx_success

  • total of TXs that are confirmed and successfully executed on chain
  • this value should consistently increase. If it does not, this could indicate RPC latency or funding issues.

solana_txm_tx_pending

  • current TXs that are inflight (not confirmed success or error)
  • this value should stay mostly constant - spikes could indicate lagging performance due to slow RPCs.

solana_txm_tx_error

  • sum of TXs that have errored for any reason
  • depending on the network configuration, this value should either be constant or increase

solana_txm_tx_error_revert

  • total of TXs that have been confirmed but error with a revert
  • depending on the network configuration, this value should either be constant or increase

solana_txm_tx_error_reject

  • total of TXs that have been immediately rejected by the RPC
  • value should be near zero, TXs should not be immediately rejected by the RPC. this could indicate faulty RPC or

solana_txm_tx_error_drop

  • total of TXs that have been broadcast to the network but was not confirmed within the configured timeout
  • an increasing value can indicate RPC latency issues or network congestion

solana_txm_tx_error_sim_revert

  • total of TXs that reverted during simulation
  • value should not increase rapidly and should be low, if it does it may indicate misconfiguration on the CL node or onchain

solana_txm_tx_error_sim_other

  • total of TXs that failed during simulation with an unrecognized error
  • value should not increase rapdily and should be low, requires looking through logs for the unrecognized error and diagnosing further from there