Reuse the same ID for both auth-less and auth-ful INVITEs #488

alexlivekit · 2025-10-20T21:30:12Z

See this thread for more details. Main point:

When we authenticate for the purposes of a SIP session (when credentials are not being cached), the following happens:

client -> livekit : I want to start a call (INVITE)
client <- livekit : You need to authenticate (SIP-407 Unauthorized).
- This may count as a failure.
- The 407 response contains an auth challenge.
client -> livekit : I want to start a call (INVITE), this time with the credentials.
- This call has the exact same callid as the first one.
- It also contains the auth response.
client -> livekit : credentials look good, I'll forward your call request (SIP-100 Trying).

If we happen to count the 407 response as a failed call, and we generate unique SCL_* ids for both.

alexlivekit · 2025-10-20T21:31:14Z

pkg/sip/inbound.go

 	inviteOKRetryAttemptsNoACK = 2
 	inviteOkAckLateTimeout     = inviteOkRetryIntervalMax
+
+	inviteCredentialValidity = 60 * time.Minute // Allow reuse of credentials for 1h


Not sure this is smart, might want to extend this to max_call_duration or something.

Also, keep in mind that this is per Call-ID for now, so new calls would still need re-auth.

pkg/sip/inbound.go

alexlivekit · 2025-10-20T21:37:11Z

pkg/sip/inbound.go

 	tr := callTransportFromReq(req)
 	legTr := legTransportFromReq(req)
 	log := s.log.WithValues(
-		"callID", callID,


The one log line that will not have callID now is "Bad request", when validation fails.

pkg/sip/inbound.go

alexlivekit · 2025-10-20T22:28:01Z

pkg/sip/outbound.go

 		if err != nil {
 			return nil, fmt.Errorf("invalid challenge %q: %w", challengeStr, err)
 		}
-		toHeader := resp.To()


Now that we're doing the right validation on the inbound side, the outbound side E2E test caught this error!
But this also means out clients might run into the same issue, in case some of them are not spec-compliant.

pkg/sip/inbound.go

dennwc · 2025-10-21T17:27:39Z

pkg/sip/inbound.go

+	s.inProgressInvites[key] = is
+
+	go func() {
+		time.Sleep(inviteCredentialValidity)


Better avoid spawning goroutines that wait for the whole hour without a clear cancellation signal.

Usually, you'd have one goroutine periodically cleaning the expired cache. time.AfterFunc could work, but again, there's no real need to create thousands of timers all waiting for an hour.

pkg/sip/inbound.go

dennwc · 2025-10-21T17:32:17Z

pkg/sip/inbound.go

 		log:      log,
 		s:        s,
-		id:       id,
+		id:       "unassigned",


Would be helpful if it logs/panics if something tries to read this unassigned ID.

I would rather not add shims or verification for this. Unless it's real clean, I'd rather leave it blank. Would that be better in your opinion?

dennwc · 2025-10-21T17:33:34Z

pkg/sip/inbound.go

+		req.To().Params.Add("tag", toTag)
+	}
+	inviteProgress := s.getInvite(sipCallID, toTag, fromTag)
+	callID := inviteProgress.lkCallID


No locking here. I assume our re-invite handling will catch a duplicate invite if it arrives at the same time?

That's right - locking is done in getInvite(), but we're using inviteProgress.lkCallID outside of it.
If two separate INVITEs come along, and they're retransmissions (same via branch), sipgo swallows it. If they're not retransmissions, these are separate invites and should be processed independently, which due to the early to-tag generation here they would (different toTag = different map key)

pkg/sip/inbound.go

…huffling

dennwc · 2025-10-23T10:06:29Z

pkg/sip/inbound.go

+		toTag:     toTag,
+		fromTag:   fromTag,
+	}
+	s.imu.Lock()


A common practice to avoid lock contention is to use RWMutex and doing a two stage lock:

s.imu.RLock() is, ok := s.inProgressInvites[key] s.imu.RUnlock() if ok { return is } s.imu.Lock() defer s.imu.Unlock() is, ok := s.inProgressInvites[key] if ok { return is } // ... the rest ...

This allows multiple readers to get the invite state without blocking each other. Also notice that we redo the check after getting a write lock - other routine might create the state earlier.

dennwc · 2025-10-23T10:08:05Z

pkg/sip/server.go

+	imu                  sync.Mutex
+	inProgressInvites    map[dialogKey]*inProgressInvite
+	inviteTimeoutQueue   utils.TimeoutQueue[*dialogKey]
+	isCleanupTaskRunning atomic.Bool


Seem unused

dennwc · 2025-10-23T10:09:14Z

pkg/sip/server.go

-	inProgressInvites []*inProgressInvite
+	imu                  sync.Mutex
+	inProgressInvites    map[dialogKey]*inProgressInvite
+	inviteTimeoutQueue   utils.TimeoutQueue[*dialogKey]


Suggested change

inviteTimeoutQueue utils.TimeoutQueue[*dialogKey]

inviteTimeoutQueue utils.TimeoutQueue[dialogKey]

We probably don't need a pointer here.

dennwc · 2025-10-23T10:12:15Z

pkg/sip/inbound.go

-	s.inProgressInvites = append(s.inProgressInvites, is)
+	is = &inProgressInvite{sipCallID: sipCallID}
+	s.inProgressInvites[key] = is
+	s.inviteTimeoutQueue.Reset(&utils.TimeoutQueueItem[*dialogKey]{Value: &key})


Looks like we do not reset the cache expiry time if we have a cache hit. Is it intentional?

dennwc · 2025-10-23T10:16:29Z

pkg/sip/server.go

-	inProgressInvites []*inProgressInvite
+	imu                  sync.Mutex
+	inProgressInvites    map[dialogKey]*inProgressInvite
+	inviteTimeoutQueue   utils.TimeoutQueue[*dialogKey]


There's a convention in Go called a Mutex hat, which implies inviteTimeoutQueue is protected by imu.

But, the queue implementation already has a mutex internally. So might be worth moving it out of the imu group.

Oh, this is neat, thank you for poinitng it out!

alexlivekit requested a review from a team as a code owner October 20, 2025 21:30

alexlivekit commented Oct 20, 2025

View reviewed changes

alexlivekit changed the title ~~Reuse the same ID for both auth-less and auth-ful INVITEs~~ TEL-222: Reuse the same ID for both auth-less and auth-ful INVITEs Oct 20, 2025

alexlivekit commented Oct 20, 2025

View reviewed changes

dennwc reviewed Oct 21, 2025

View reviewed changes

alexlivekit changed the title ~~TEL-222: Reuse the same ID for both auth-less and auth-ful INVITEs~~ Reuse the same ID for both auth-less and auth-ful INVITEs Oct 21, 2025

alexlivekit marked this pull request as draft October 22, 2025 23:33

alexlivekit added 9 commits October 22, 2025 17:03

test fix

b720467

inProgressInvite lifecycle, stable ID generation, minor log field res…

0bc3d6e

…huffling

adding test

0634af3

self-review

3fa9108

bugfix!

6906ea9

PR comments, except one

ef60152

Adding single goroutine to batch-delete invites at intervals

6155ab5

self-review

1a0eb73

fix test

de0faf0

alexlivekit force-pushed the reuse-callid-auth branch from 0658b5d to de0faf0 Compare October 23, 2025 00:07

alexlivekit marked this pull request as ready for review October 23, 2025 07:45

dennwc reviewed Oct 23, 2025

View reviewed changes

	inviteTimeoutQueue utils.TimeoutQueue[*dialogKey]
	inviteTimeoutQueue utils.TimeoutQueue[dialogKey]

Reuse the same ID for both auth-less and auth-ful INVITEs #488

Are you sure you want to change the base?

Reuse the same ID for both auth-less and auth-ful INVITEs #488

Uh oh!

Conversation

alexlivekit commented Oct 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexlivekit Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexlivekit Oct 20, 2025 •

edited

Loading