fix: make token refresh robust to network errors #610

irishrain · 2024-12-28T23:47:34Z

I finally found the reason for the logouts (first described in #79 ). When a token is refreshed, but the new token newer makes it to the client due to network issues, the client tries to refresh with an old token and is logged out. This change keeps the old token until the new token has been used once. Should fix #414

TomBursch · 2024-12-30T10:47:49Z

Hey, thanks for this, and I like the idea of making this more robust. But I think there are some minor security issues with your implementation.

You're probably aware of why I'm using refresh token rotation, but here is a short explanation. The idea is that old tokens which might have been leaked cannot be reused.

In your implementation:

Scenario 1:
- User creates a new refresh token (but does not use the access token)
- State is leaked to a malicious actor
- User creates another refresh token and continues normal use
- Malicious actor uses the leaked/old access token -> deleting all future tokens
- Malicious actor now has the most current refresh token and hijacked the session

irishrain · 2024-12-30T11:02:26Z

Thanks for the feedback. The goal was to have the following happen in your scenario:
In step three, when the user requests a new refresh token and uses it, all other old/unused/... tokens should be invalidated.
Maybe I implemented it wrong, I will doublecheck.

backend/app/controller/auth/auth_controller.py

backend/app/models/token.py

TomBursch · 2024-12-30T11:12:34Z

Yes, I think I get what you're trying to do. I've added some comments on what I would change. Let me know/ping me when you want another review.

irishrain · 2024-12-30T13:53:21Z

Before working on fixing the implementation, I thought it would be helpful to properly document what I want to implement, so we can separate design and implementation discussions. I took the sequence diagrams of the linked documentation for refresh tokens, and extended them with my proposal:

First the normal case without an attacker and any network issues:

sequenceDiagram
    participant LC as Legitimate Client
    participant A as Auth0
    
    Note over LC,A: Initial state: Client has RT1
    LC->>A: Exchange RT1 for new tokens
    Note over A: Generate AT2, RT2
    Note over A: Keep RT1 valid
    A-->>LC: Return AT2, RT2
    LC->>A: Use AT2 for API call
    Note over A: First use of new access token
    Note over A: Now invalidate RT1
    Note over LC,A: Later: AT2 expires
    LC->>A: Exchange RT2 for new tokens
    Note over A: Generate AT3, RT3
    Note over A: Keep RT2 valid
    A-->>LC: Return AT3, RT3
    LC->>A: Use AT3 for API call
    Note over A: First use of new access token
    Note over A: Now invalidate RT2

Now with a network issue:

sequenceDiagram
    participant LC as Legitimate Client
    participant A as Auth0
    
    Note over LC,A: Initial state: Client has RT1
    LC->>A: Exchange RT1 for new tokens
    Note over A: Generate AT2, RT2
    Note over A: Keep RT1 valid
    A-->>LC: Return AT2, RT2 (network error)
    Note over LC: Never receives tokens
    LC->>A: Retry: Exchange RT1 for new tokens
    Note over A: RT1 being reused - indicates<br/>possible network issues
    Note over A: Invalidate AT2, RT2<br/>since RT1 is being reused
    Note over A: Generate AT3, RT3
    A-->>LC: Return AT3, RT3
    LC->>A: Use AT3 for API call
    Note over A: First use of new access token
    Note over A: Now invalidate RT1

Now with an attack:

sequenceDiagram
    participant MC as Malicious Client
    participant LC as Legitimate Client
    participant A as Auth0
    
    Note over LC,A: Initial state: Both have RT1
    LC->>A: Exchange RT1 for new tokens
    Note over A: Generate AT2, RT2
    Note over A: Keep RT1 valid
    A-->>LC: Return AT2, RT2
    LC->>A: Use AT2 for API call
    Note over A: First use of new access token
    Note over A: Now invalidate RT1
    MC->>A: Try to use RT1
    Note over A: RT1 used after invalidation<br/>indicates compromise
    Note over A: Invalidate entire token family<br/>(RT1, AT2, RT2)
    A-->>MC: Access Denied
    LC->>A: Try to use RT2
    A-->>LC: Access Denied
    Note over LC: Must re-authenticate

The case where the attacker acts first:

sequenceDiagram
    participant MC as Malicious Client
    participant LC as Legitimate Client
    participant A as Auth0
    
    Note over MC,A: Initial state: Both have RT1
    MC->>A: Exchange RT1 for new tokens
    Note over A: Generate AT2, RT2
    Note over A: Keep RT1 valid
    A-->>MC: Return AT2, RT2
    MC->>A: Use AT2 for API call
    Note over A: First use of new access token
    Note over A: Now invalidate RT1
    LC->>A: Try to use RT1
    Note over A: RT1 used after invalidation<br/>indicates compromise
    Note over A: Invalidate entire token family<br/>(RT1, AT2, RT2)
    A-->>LC: Access Denied
    MC->>A: Try to use RT2
    A-->>MC: Access Denied
    Note over LC, MC: Both must re-authenticate

I tried thinking of race conditions and how to handle them, this is what I came up with:

sequenceDiagram
   participant LC as Legitimate Client
   participant A as Auth0
   participant MC as Malicious Client
   
   Note right of LC: Has RT1
   Note left of MC: Also has RT1
   LC->>A: Exchange RT1 for new tokens
   Note over A: Generate AT2, RT2
   Note over A: Keep RT1 valid
   A-->>LC: Return AT2, RT2 (network delay)
   MC->>A: Exchange RT1 for new tokens
   Note over A: RT1 still valid (no AT2 use yet)
   Note over A: Generate AT3, RT3
   Note over A: Invalidate AT2, RT2
   A-->>MC: Return AT3, RT3
   MC->>A: Use AT3 for API call
   Note over A: First use of new access token
   Note over A: Now invalidate RT1
   Note over LC: Finally receives AT2, RT2
   LC->>A: Try to use AT2
   Note over A: AT2 was already invalidated
   A-->>LC: Access Denied
   LC->>A: Try to use RT1
   Note over A: RT1 use after invalidation<br/>indicates compromise
   Note over A: Invalidate all tokens<br/>(RT1, AT3, RT3)
   A-->>LC: Access Denied
   Note right of LC: Must re-authenticate
   Note left of MC: Must re-authenticate

What do you think, should I try implementing it?

TomBursch · 2024-12-30T22:24:33Z

Your sequence diagrams look good.

In addition, if you look at the tree structure of tokens, it should be possible to create as many refresh tokens as possible (multiple network failures) and only once one subtree has been used no more refresh tokens can be created with the original refresh token and all other refresh token subtrees should be invalidated. Furthermore, access tokens don't need to be used.

The following structure would be valid and RT2 is the "current" refresh token:

stateDiagram-v2
    RT1 : RT1 (Used)
    AT1 : AT1 (Unused)
    RT2 : RT2 (Used)
    AT2 : AT2 (Used)
    RT3 : RT3 (Unused)
    AT3 : AT3 (Unused)
    RT4 : RT4 (Unused)
    AT4 : AT4 (Unused)
    RT5 : RT5 (Unused)
    AT5 : AT5 (Unused)

    RT1 --> AT1
    RT1 --> RT2
    RT2 --> AT2
    RT2 --> RT3
    RT3 --> AT3
    RT2 --> RT4
    RT4 --> AT4
    RT2 --> RT5
    RT5 --> AT5

irishrain · 2024-12-31T01:16:57Z

Thanks for the additional input. I implemented my sequence diagrams and a sequence derived from your diagram as testcases, and then threw away my first approach (I certainly underestimated the complexity of this). Now my tests are passing, can you please look at my changes again? I think you can discard the previous conversations.

irishrain · 2024-12-31T01:46:02Z

Sorry for all the force pushs, it took me a while to clean up the commit history. Also: Don't merge this just yet, I wrote some more test cases and discovered another bug. I'll ping you here when it's fixed.

irishrain · 2024-12-31T19:55:21Z

Ok, one more testcase, one bug fixed. Please have a look, @TomBursch

TomBursch · 2025-01-02T13:09:06Z

First thanks again for all the work, I still would like to change some things (Sorry about that 😅). I want to keep the database requests as minimal as possible inside the check_if_token_revoked method, and I think we can optimize.

I think it isn't a bad idea to invalidate unused refresh tokens once a new refresh token has been requested from the parent
Keep the behavior of deleting access tokens inside the refresh method.

What do you think of this? This should allow us to move most of the logic from check_if_token_revoked to the create refresh token method.

My last diagram wasn't correct in that regard (even with the current implementation, the tokens shouldn't exist).

stateDiagram-v2
    RT1 : RT1 (Used)
    RT2 : RT2 (Used)
    RT3 : RT3 (Invalidated)
    RT4 : RT4 (Invalidated)
    RT5 : RT5 (Unused)
    AT5 : AT5 (Unused)

    RT1 --> RT2
    RT2 --> RT3
    RT2 --> RT4
    RT2 --> RT5
    RT5 --> AT5

irishrain · 2025-01-02T18:42:20Z

I think it isn't a bad idea to invalidate unused refresh tokens once a new refresh token has been requested from the parent
Agreed, I silently already implemented it like this.

I want to keep the database requests as minimal as possible inside the check_if_token_revoked method, and I think we can optimize.

Minimizing database requests is a valid goal. I looked at check_if_token_revoked again. And you are right, half the changes are not needed at all, and one DB access is only needed if you want to ensure that not expired ATs are not accepted anymore, after the use of a new AT activates a new branch. I removed it as well. Let me know if that IS a security goal, then I can add that code again.

TomBursch

LGTM, thank you so much for working on this!

TomBursch reviewed Dec 30, 2024

View reviewed changes

TomBursch added the bug Something isn't working label Dec 30, 2024

irishrain force-pushed the fix/rework_key_refresh branch from f628691 to c721b93 Compare December 31, 2024 00:40

irishrain force-pushed the fix/rework_key_refresh branch 4 times, most recently from 216c8f1 to 2d488eb Compare December 31, 2024 01:44

irishrain mentioned this pull request Dec 31, 2024

fix: prevent incorrect offline->online transitions without successful… #617

Open

TomBursch approved these changes Jan 3, 2025

View reviewed changes

irishrain added 4 commits January 3, 2025 11:06

fix: make token refresh robust to network errors

6fcc142

fix: additional testcase and wrong indentation

4f05c7e

Don't log out when coming back online

b9260b0

Leaked AT are now accepted during their validity time

03978dd

TomBursch force-pushed the fix/rework_key_refresh branch from 910a979 to 03978dd Compare January 3, 2025 10:06

TomBursch merged commit d10efa8 into TomBursch:main Jan 3, 2025
3 checks passed

irishrain deleted the fix/rework_key_refresh branch January 4, 2025 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: make token refresh robust to network errors #610

fix: make token refresh robust to network errors #610

irishrain commented Dec 28, 2024

TomBursch commented Dec 30, 2024 •

edited

Loading

irishrain commented Dec 30, 2024

TomBursch commented Dec 30, 2024

irishrain commented Dec 30, 2024

TomBursch commented Dec 30, 2024

irishrain commented Dec 31, 2024 •

edited

Loading

irishrain commented Dec 31, 2024 •

edited

Loading

irishrain commented Dec 31, 2024

TomBursch commented Jan 2, 2025

irishrain commented Jan 2, 2025 •

edited

Loading

TomBursch left a comment

fix: make token refresh robust to network errors #610

fix: make token refresh robust to network errors #610

Conversation

irishrain commented Dec 28, 2024

TomBursch commented Dec 30, 2024 • edited Loading

irishrain commented Dec 30, 2024

TomBursch commented Dec 30, 2024

irishrain commented Dec 30, 2024

TomBursch commented Dec 30, 2024

irishrain commented Dec 31, 2024 • edited Loading

irishrain commented Dec 31, 2024 • edited Loading

irishrain commented Dec 31, 2024

TomBursch commented Jan 2, 2025

irishrain commented Jan 2, 2025 • edited Loading

TomBursch left a comment

Choose a reason for hiding this comment

TomBursch commented Dec 30, 2024 •

edited

Loading

irishrain commented Dec 31, 2024 •

edited

Loading

irishrain commented Dec 31, 2024 •

edited

Loading

irishrain commented Jan 2, 2025 •

edited

Loading