Machines running hydra nodes become unresponsive after long periods of intense traffic #1582

Quantumplation · 2024-08-21T15:55:49Z

Context & versions

Discovered during the hydra-doom demo at Rare Evo, using both the :doom and :doom-memory-hack tagged docker images built by @ch1bo

Steps to reproduce

(This may not be a minimal reproduce and requires further investigation)

Spin up 16 hydra nodes on an r5.8xlarge with 1TB of disk space

Submit between 35 and 250 transactions per second to each node for several hours.

Reach out to me if you want help reproducing the issue, I have a saved disk snapshot from the event.

Actual behavior

Eventually, the hydra heads disconnect from websocket connections, refuse new connections, and the host itself becomes unresponsive to SSH; The machine must be power cycled to regain access.

Initially this was believed to be because of the memory leak, but even with the hacks from #1572, this would occur.
It's also unlikely to be disk space, because several of the nodes were only at 60% disk used after rebooting.

Expected behavior

A hydra node should be able to operate indefinitely at high load, provided basic assumptions like enough disk space etc.

If this is an issue with something about the server provisioning, rather than the hydra node itself, a useful outcome of this story would be to document best practices for hosting the nodes and how to avoid this scenario.

The text was updated successfully, but these errors were encountered:

noonio · 2025-03-14T10:26:33Z

Thanks for reporting this @Quantumplation ; I think I'll close for now and assume we've addressed it with some recent networking and memory changes.

Will re-investigate if further stress-testing shows issues.

Quantumplation added the bug 🐛 Something isn't working label Aug 21, 2024

github-project-automation bot added this to ☕ Hydra Team Work Aug 21, 2024

github-project-automation bot moved this to In Progress 🕐 in ☕ Hydra Team Work Aug 21, 2024

noonio changed the title ~~Machines running hydra nodes become responsive after long periods of intense traffic~~ Machines running hydra nodes become unresponsive after long periods of intense traffic Aug 22, 2024

noonio added this to 🚢 Hydra Head Roadmap Jan 29, 2025

noonio moved this to 🚀 Planned in 🚢 Hydra Head Roadmap Feb 26, 2025

noonio moved this to Blocked ✋ in ☕ Hydra Team Work Mar 4, 2025

noonio moved this from Blocked ✋ to In progress 🕐 in ☕ Hydra Team Work Mar 10, 2025

noonio removed the bug 🐛 Something isn't working label Mar 11, 2025

noonio closed this as not planned Won't fix, can't repro, duplicate, stale Mar 14, 2025

github-project-automation bot moved this to ✅ Done in 🚢 Hydra Head Roadmap Mar 14, 2025

github-project-automation bot moved this from In progress 🕐 to Done ✔ in ☕ Hydra Team Work Mar 14, 2025

noonio removed the status in 🚢 Hydra Head Roadmap Mar 27, 2025

noonio added this to the 0.x.x milestone Mar 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Machines running hydra nodes become unresponsive after long periods of intense traffic #1582

Machines running hydra nodes become unresponsive after long periods of intense traffic #1582

Quantumplation commented Aug 21, 2024 •

edited by ch1bo

Loading

noonio commented Mar 14, 2025

Uh oh!

Machines running hydra nodes become unresponsive after long periods of intense traffic #1582

Machines running hydra nodes become unresponsive after long periods of intense traffic #1582

Comments

Quantumplation commented Aug 21, 2024 • edited by ch1bo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context & versions

Steps to reproduce

Actual behavior

Expected behavior

noonio commented Mar 14, 2025

Uh oh!

Quantumplation commented Aug 21, 2024 •

edited by ch1bo

Loading