Skip to content

fix: WHIP/WHEP inactive status, CouchDB resilience, and CORS centralization#194

Merged
birme merged 4 commits intomainfrom
fix/whip-whep-inactive-status
Feb 26, 2026
Merged

fix: WHIP/WHEP inactive status, CouchDB resilience, and CORS centralization#194
birme merged 4 commits intomainfrom
fix/whip-whep-inactive-status

Conversation

@LucasMaupin
Copy link
Copy Markdown
Contributor

@LucasMaupin LucasMaupin commented Feb 26, 2026

Summary

Fixes three related issues with WHIP/WHEP session management and CouchDB stability:

  • WHIP/WHEP sessions never shown as inactive: All three API response mappings (api_productions.ts x2, production_manager.ts x1) forced isActive: true for WHIP sessions, so they'd jump straight from active to gone when expired. Now passes through the actual DB value.
  • CouchDB crash loop (146+ restarts on intercomprod): Transient network errors (ECONNRESET, socket hang up) on CouchDB calls were unhandled, crashing the Node.js process. Adds withRetry() wrapper around ALL nano DB operations with exponential backoff, connection retry (5 attempts), and 10s request timeout.
  • CORS centralization: Moves wildcard CORS from per-route onSend hooks in api_whip.ts/api_whep.ts to a centralized delegator in the @fastify/cors plugin. WHIP/WHEP routes get origin: '*', everything else uses CORS_ORIGIN.

Changes

src/db/couchdb.ts

  • withRetry<T>() — generic retry wrapper for all DB operations (3 retries, 100/200/400ms backoff)
  • isTransientError() — detects ECONNRESET, ETIMEDOUT, socket hang up, EHOSTUNREACH
  • connect() — retry with backoff (5 attempts, 1/2/4/8/16s)
  • nano client configured with 10s request timeout
  • Proper 404 handling in getNextSequence (no longer swallows non-404 errors)

src/db/couchdb.test.ts

  • 360 lines of new tests covering retry behavior, transient error handling, connection failures

src/server.ts

  • process.exit(1) in uncaughtException handler for clean pod restart

src/api_productions.ts

  • Lines 514, 948: isActive: s.isWhip ? true : ...isActive: !!s.isActive

src/production_manager.ts

  • Line 490: same isWhip override removed

src/api.ts

  • Centralized CORS delegator: WHIP/WHEP routes get origin: '*', all other routes use CORS_ORIGIN

src/api_whip.ts / src/api_whep.ts

  • Removed redundant per-route onSend CORS hooks

Test plan

  • All 228 backend tests pass
  • Start a WHIP session → verify it shows as active
  • Stop the WHIP source → verify it transitions to "(inactive)" before disappearing
  • Simulate CouchDB transient error → verify retry succeeds without process crash
  • Verify WHIP/WHEP endpoints respond with Access-Control-Allow-Origin: *
  • Verify non-WHIP/WHEP endpoints use restrictive CORS from CORS_ORIGIN
  • Deploy to beta and confirm pod restart count stabilizes

🤖 Generated with Claude Code

Both participant API endpoints were overriding isActive to always
return true for WHIP/WHEP sessions, preventing them from ever
appearing as inactive in the frontend. Sessions would jump straight
from active to gone when they expired. Now the actual DB value is
passed through so the frontend can show the inactive state.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@LucasMaupin LucasMaupin requested a review from birme as a code owner February 26, 2026 16:06
LucasMaupin and others added 3 commits February 26, 2026 17:11
The existing insertWithRetry only covered insert operations. Transient
network errors (ECONNRESET, socket hang up) on any CouchDB call would
still crash the process. This adds:

- withRetry() wrapper around ALL nano DB operations (get, list, insert,
  destroy) with exponential backoff (3 retries, 100/200/400ms)
- isTransientError() detection for ECONNRESET, ETIMEDOUT, socket hang
  up, and EHOSTUNREACH
- Connection retry with backoff (5 attempts) in connect()
- 10s request timeout in nano client config
- process.exit(1) in uncaughtException handler to ensure clean restart
  instead of zombie process
- 360 lines of new CouchDB tests covering retry behavior

Fixes the recurring pod restarts (146+ on intercomprod) caused by
unhandled socket hang up errors from intermediate network components.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Move wildcard CORS from per-route onSend hooks in api_whip.ts and
api_whep.ts to a centralized delegator in the @fastify/cors plugin.
WHIP/WHEP routes get origin: '*', everything else uses CORS_ORIGIN.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Third location where isActive was forced to true for WHIP sessions,
in the getUsersForLine response mapping. Also adds production manager
resilience tests for session lifecycle edge cases.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@LucasMaupin LucasMaupin changed the title fix: WHIP/WHEP sessions never shown as inactive fix: WHIP/WHEP inactive status, CouchDB resilience, and CORS centralization Feb 26, 2026
@birme birme merged commit 8fc168b into main Feb 26, 2026
4 checks passed
@LucasMaupin LucasMaupin deleted the fix/whip-whep-inactive-status branch February 27, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants