#721 Fix publisher negotiation reliability by preventing overlapping renegotiations #797

adrian-niculescu · 2025-11-10T12:42:54Z

Follow-up: fixes #721

PR #789 added a mutex but used tryLock(), which released the mutex before connection completed, still allowing overlapping renegotiations (sorry, @davidliu).

Fix

Use withLock() instead of tryLock() so negotiation attempts wait instead of being skipped
Hold the mutex until connection completes (or times out) to prevent overlapping renegotiations
Move hasPublished = true before the client.isConnected check for proper reconnect handling
Add @Volatile to hasPublished for thread-safe memory visibility
Ensure the negotiation mutex only waits until the publisher connection is either connected or definitively failed (waitUntilConnectedOrDisconnected), so failed negotiations don't delay reconnect.

changeset-bot · 2025-11-10T12:43:00Z

🦋 Changeset detected

Latest commit: 405373a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
client-sdk-android	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

davidliu · 2025-11-16T17:36:45Z

livekit-android-sdk/src/main/java/io/livekit/android/room/RTCEngine.kt

+                // Best-effort wait for publisher to connect before releasing mutex
+                // to avoid overlapping renegotiations.
+                withTimeoutOrNull(MAX_ICE_CONNECT_TIMEOUT_MS.toLong()) {
+                    publisherObserver.waitUntilConnected()


Hmm, there's a change that's on my todo list that should ensure we ignore offers/answers that come from old negotiations (server offer/answers now come with ids associated with them). I wonder if that's the missing piece here that will alleviate the issue of overlapping renegotiations.

My main concern here is that waiting for each negotiation to finish each time will cause the overall connection time to increase quite a bit.

@davidliu - fair to be careful about this. I analyzed the code flow to understand the actual impact:

The mutex + waitUntilConnected() is only held while the publisher PeerConnection is not yet in CONNECTED. Once it's connected, additional negotiatePublisher() calls return immediately, so normal renegotiations don't pay extra latency.

Where your concern is valid is in a failure + soft reconnect path: if the initial negotiation fails and the publisher goes to FAILED, the current code still holds the mutex until the 20s timeout, so the first soft reconnect attempt can't start its own negotiation until that timeout elapses. That can add up to MAX_ICE_CONNECT_TIMEOUT_MS (~20s) to the reconnection time in that scenario.

A targeted fix would be to have waitUntilConnected() return not only when the connection becomes CONNECTED, but also when it becomes "disconnected" (FAILED/CLOSED). That way we still serialize negotiations, but we don't delay reconnect: the mutex is released promptly on failure and the reconnect negotiation can start immediately. The negotiation-ID filtering you mentioned would complement this, but doesn't by itself remove the extra wait.

@davidliu Quick follow-up on this after digging deeper into the flow and your latency concern.

Based on the analysis I mentioned earlier, I updated the implementation to address the failure + soft reconnect case without giving up the “no overlapping renegotiations” guarantee:

The mutex is still only used to serialize publisher negotiations, but the wait helper is now waitUntilConnectedOrDisconnected(), which returns as soon as the publisher PeerConnection is either CONNECTED or has definitively terminated (FAILED/CLOSED).

In the normal / steady state (publisher already connected), additional negotiatePublisher() calls effectively don’t wait — the first emission is already CONNECTED, so renegotiations don’t pay extra latency.

In the failure + soft reconnect scenario you called out, we no longer hold the mutex for the full MAX_ICE_CONNECT_TIMEOUT_MS when the connection has already failed. The helper exits as soon as the publisher reaches a terminal failure state, so the soft reconnect can immediately start a fresh negotiation instead of being blocked for ~20s.

DISCONNECTED is still treated as transient (per the existing isDisconnected() extension), so we keep waiting through temporary drops where ICE can still recover.

So the behavior now is: serialize negotiations while the connection is in a transient state, but release the lock promptly once it’s either successfully connected or clearly dead. Your planned change to use negotiation IDs on offers/answers will still be a nice complementary layer on the signaling side; this PR just tightens the local sequencing so we don’t introduce avoidable reconnect latency in the failure path.

So taking a closer look at the code, I still think the waitUntilConnectedOrDisconnected is unneeded?

With the withLock implementation serializing the calls to createAndSendOffer, this should handle subsequent calls to it. The scenario I'm thinking of is if multiple things want to kick off a negotiation, it should aggregate all the negotiations if one's already in-flight, and handle them all in a single renegotiation, rather than each extra request handled sequentially.

I think I'll also need to add a lock directly to createAndSendOffer, since it is called directly elsewhere.

…ivekit#721).

adrian-niculescu · 2025-11-24T11:28:47Z

@davidliu, closing this PR as your changes in #813 solved the overlapping negotiation issue using offer ID tracking, which avoids the connection time penalty that the serialization approach in this PR would introduce.

The independent fixes from this PR (adding @Volatile to hasPublished and moving its assignment before the connection check) have been extracted into #814.

adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch 2 times, most recently from f157148 to 50f86d5 Compare November 10, 2025 14:17

adrian-niculescu marked this pull request as draft November 13, 2025 17:19

adrian-niculescu marked this pull request as ready for review November 13, 2025 18:47

davidliu reviewed Nov 16, 2025

View reviewed changes

adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch 2 times, most recently from 248f106 to e85dfcb Compare November 16, 2025 20:13

adrian-niculescu marked this pull request as draft November 16, 2025 20:22

adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch from e85dfcb to 429f14a Compare November 16, 2025 20:49

adrian-niculescu marked this pull request as ready for review November 16, 2025 20:55

adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch from 429f14a to e9153df Compare November 16, 2025 20:58

adrian-niculescu requested a review from davidliu November 16, 2025 21:14

adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch from 0a3bb58 to 96a5cf8 Compare November 17, 2025 18:18

Fixed publisher negotiation by preventing overlapping renegotiations (l…

405373a

…ivekit#721).

adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch from 96a5cf8 to 405373a Compare November 17, 2025 19:16

adrian-niculescu mentioned this pull request Nov 24, 2025

Improve thread-safety and reconnection reliability for hasPublished flag #814

Merged

adrian-niculescu closed this Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#721 Fix publisher negotiation reliability by preventing overlapping renegotiations #797

#721 Fix publisher negotiation reliability by preventing overlapping renegotiations #797

Uh oh!

adrian-niculescu commented Nov 10, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

davidliu Nov 16, 2025

Uh oh!

adrian-niculescu Nov 16, 2025

Uh oh!

adrian-niculescu Nov 16, 2025 •

edited

Loading

Uh oh!

davidliu Nov 24, 2025

Uh oh!

adrian-niculescu commented Nov 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

#721 Fix publisher negotiation reliability by preventing overlapping renegotiations #797

#721 Fix publisher negotiation reliability by preventing overlapping renegotiations #797

Uh oh!

Conversation

adrian-niculescu commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix

Uh oh!

changeset-bot bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

davidliu Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

adrian-niculescu Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

adrian-niculescu Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidliu Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

adrian-niculescu commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adrian-niculescu commented Nov 10, 2025 •

edited

Loading

changeset-bot bot commented Nov 10, 2025 •

edited

Loading

adrian-niculescu Nov 16, 2025 •

edited

Loading

adrian-niculescu commented Nov 24, 2025 •

edited

Loading