fix(hermes): disconnect slow WS/SSE consumers to prevent OOM by ali-behjati · Pull Request #3769 · pyth-network/pyth-crosschain

ali-behjati · 2026-05-29T13:25:13Z

Summary

Add configurable slow-consumer protection for Hermes streaming endpoints via RPC_DISCONNECT_SLOW_CONSUMERS (default: true) and RPC_WS_MAX_WRITE_BUFFER_BYTES (default: 2 MiB).
Cap WebSocket write buffers when enabled and disconnect clients on WriteBufferFull instead of allowing tungstenite's unlimited outbound buffer to grow under TCP backpressure.
Disconnect lagging SSE clients with a terminal error event when broadcast lag is detected, instead of keeping connections open for up to 24h.
Add streaming observability metrics: active connections, slow-consumer disconnects, and SSE broadcast lag events.

Test plan

cargo test in apps/hermes/server (38 tests pass)
Manual: connect WS client, subscribe, throttle reads → connection closes and stream_slow_consumer_disconnects_total{protocol="ws"} increments
Manual: connect SSE client, throttle reads while slots update → stream ends with Slow consumer: disconnected and stream_slow_consumer_disconnects_total{protocol="sse"} increments
Manual: run with RPC_DISCONNECT_SLOW_CONSUMERS=false and confirm legacy behavior (WS unlimited buffer, SSE continues after lag errors)
Prod canary: watch stream_active_connections, stream_slow_consumer_disconnects_total, and pod memory after deploy

Made with Cursor

Cap WebSocket write buffers and close lagging streaming clients behind RPC_DISCONNECT_SLOW_CONSUMERS so slow consumers cannot grow unbounded in-process queues and hold long-lived connections. Co-authored-by: Cursor <cursoragent@cursor.com>

vercel · 2026-05-29T13:25:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
api-reference	Ready	Preview, Comment	May 29, 2026 4:35pm
component-library	Ready	Preview, Comment	May 29, 2026 4:35pm
developer-hub	Ready	Preview, Comment	May 29, 2026 4:35pm
entropy-explorer	Ready	Preview, Comment	May 29, 2026 4:35pm
insights	Error		May 29, 2026 4:35pm
proposals	Ready	Preview, Comment	May 29, 2026 4:35pm
staking	Ready	Preview, Comment	May 29, 2026 4:35pm

Minor release for slow-consumer disconnect and streaming backpressure changes. Co-authored-by: Cursor <cursoragent@cursor.com>

Skip the 24h timeout SSE event after slow-consumer disconnect, allow setting disconnect_slow_consumers=false via CLI, and suppress dead_code warnings in generated wormhole protobuf code. Co-authored-by: Cursor <cursoragent@cursor.com>

…tection is_write_buffer_full downcast WsError from tokio-tungstenite 0.26, but axum 0.6 wraps tungstenite 0.20 errors, so the check never matched. Use tungstenite 0.20.1 directly and add a test for the axum::Error wrapping path. Co-authored-by: Cursor <cursoragent@cursor.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 8 additional findings in Devin Review.

devin-ai-integration · 2026-05-29T16:40:54Z

🚩 WS broadcast channel Lagged errors are not handled as slow consumer disconnects

For WebSocket, slow consumer detection relies on tungstenite::Error::WriteBufferFull (the send-side buffer to the client filling up), not on tokio::sync::broadcast::RecvError::Lagged (the server-side broadcast channel falling behind). At ws.rs:400-404, a Lagged error from self.notify_receiver.recv() is converted to anyhow!("Failed to receive update from store: {:?}", e), which won't match is_write_buffer_full. This means WS clients that lag on the broadcast channel are disconnected silently without the slow consumer metric being recorded. This is an asymmetry with the SSE handler which explicitly tracks sse_broadcast_lagged for the same condition. Consider whether WS should also record this metric for observability parity.

(Refers to lines 400-404)

Was this helpful? React with 👍 or 👎 to provide feedback.

main already had #3769 ("disconnect slow WS/SSE consumers to prevent OOM"), which solves the same problem this branch does but with a different mechanism (tungstenite write-buffer cap + an RPC_DISCONNECT_SLOW_CONSUMERS config flag + protocol-labelled metrics). Per request, this branch's solution is kept for the overlap: - ws.rs, sse.rs, metrics_middleware.rs: kept this branch's versions (per-write WS_SEND_TIMEOUT; SSE producer task + bounded channel; the sse_slow_consumer_disconnects / sse_connection_timeouts counters). - api.rs, config/rpc.rs, rest.rs: reverted #3769's StreamingConfig scaffolding, since this solution is always-on and does not read it (a config flag that silently did nothing would be a footgun). - Cargo.toml: kept the version bump to 0.11.0; dropped #3769's now-unused direct `tungstenite` dependency (it remains transitively via axum). - network/wormhole.rs: kept #3769's `dead_code` allow (orthogonal CI fix). All other incoming main changes (CI workflow bumps, fortuna/quorum/etc.) are taken as-is. Verified: cargo check + clippy clean, 33/33 tests pass.

vercel Bot deployed to Preview – component-library May 29, 2026 13:26 View deployment

vercel Bot deployed to Preview – api-reference May 29, 2026 13:26 View deployment

vercel Bot deployed to Preview – proposals May 29, 2026 13:26 View deployment

vercel Bot deployed to Preview – entropy-explorer May 29, 2026 13:26 View deployment

vercel Bot deployed to Preview – staking May 29, 2026 13:27 View deployment

vercel Bot had a problem deploying to Preview – insights May 29, 2026 13:27 Failure

vercel Bot deployed to Preview – developer-hub May 29, 2026 13:27 View deployment

chore(hermes): bump version to 0.11.0

9d3a5ff

Minor release for slow-consumer disconnect and streaming backpressure changes. Co-authored-by: Cursor <cursoragent@cursor.com>

vercel Bot deployed to Preview – component-library May 29, 2026 13:30 View deployment

vercel Bot deployed to Preview – entropy-explorer May 29, 2026 13:31 View deployment

vercel Bot deployed to Preview – proposals May 29, 2026 13:31 View deployment

vercel Bot deployed to Preview – api-reference May 29, 2026 13:31 View deployment

vercel Bot deployed to Preview – staking May 29, 2026 13:31 View deployment

vercel Bot had a problem deploying to Preview – insights May 29, 2026 13:31 Failure

vercel Bot deployed to Preview – developer-hub May 29, 2026 13:31 View deployment

This comment was marked as resolved.

Sign in to view

vercel Bot deployed to Preview – component-library May 29, 2026 14:12 View deployment

vercel Bot deployed to Preview – staking May 29, 2026 14:13 View deployment

vercel Bot deployed to Preview – entropy-explorer May 29, 2026 14:13 View deployment

vercel Bot deployed to Preview – proposals May 29, 2026 14:13 View deployment

vercel Bot deployed to Preview – api-reference May 29, 2026 14:13 View deployment

vercel Bot had a problem deploying to Preview – insights May 29, 2026 14:13 Failure

vercel Bot deployed to Preview – developer-hub May 29, 2026 14:13 View deployment

This comment was marked as resolved.

Sign in to view

keyvankhademi approved these changes May 29, 2026

View reviewed changes

vercel Bot deployed to Preview – component-library May 29, 2026 16:33 View deployment

vercel Bot deployed to Preview – entropy-explorer May 29, 2026 16:34 View deployment

vercel Bot deployed to Preview – staking May 29, 2026 16:34 View deployment

vercel Bot deployed to Preview – api-reference May 29, 2026 16:34 View deployment

vercel Bot deployed to Preview – proposals May 29, 2026 16:34 View deployment

vercel Bot had a problem deploying to Preview – insights May 29, 2026 16:34 Failure

vercel Bot deployed to Preview – developer-hub May 29, 2026 16:35 View deployment

devin-ai-integration Bot reviewed May 29, 2026

View reviewed changes

ali-behjati merged commit 419d938 into main May 29, 2026
14 of 15 checks passed

ali-behjati deleted the hermes/streaming-slow-consumer-disconnect branch May 29, 2026 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(hermes): disconnect slow WS/SSE consumers to prevent OOM#3769

fix(hermes): disconnect slow WS/SSE consumers to prevent OOM#3769
ali-behjati merged 4 commits into
mainfrom
hermes/streaming-slow-consumer-disconnect

ali-behjati commented May 29, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

vercel Bot commented May 29, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ali-behjati commented May 29, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vercel Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ali-behjati commented May 29, 2026 •

edited by devin-ai-integration Bot

Loading

vercel Bot commented May 29, 2026 •

edited

Loading