Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make token refresh robust to network errors #610

Merged
merged 4 commits into from
Jan 3, 2025

Conversation

irishrain
Copy link
Contributor

I finally found the reason for the logouts (first described in #79 ). When a token is refreshed, but the new token newer makes it to the client due to network issues, the client tries to refresh with an old token and is logged out. This change keeps the old token until the new token has been used once. Should fix #414

@TomBursch
Copy link
Owner

TomBursch commented Dec 30, 2024

Hey, thanks for this, and I like the idea of making this more robust. But I think there are some minor security issues with your implementation.

You're probably aware of why I'm using refresh token rotation, but here is a short explanation. The idea is that old tokens which might have been leaked cannot be reused.

In your implementation:

  • Scenario 1:
    • User creates a new refresh token (but does not use the access token)
    • State is leaked to a malicious actor
    • User creates another refresh token and continues normal use
    • Malicious actor uses the leaked/old access token -> deleting all future tokens
    • Malicious actor now has the most current refresh token and hijacked the session

@irishrain
Copy link
Contributor Author

Thanks for the feedback. The goal was to have the following happen in your scenario:
In step three, when the user requests a new refresh token and uses it, all other old/unused/... tokens should be invalidated.
Maybe I implemented it wrong, I will doublecheck.

backend/app/controller/auth/auth_controller.py Outdated Show resolved Hide resolved
backend/app/controller/auth/auth_controller.py Outdated Show resolved Hide resolved
backend/app/controller/auth/auth_controller.py Outdated Show resolved Hide resolved
backend/app/models/token.py Show resolved Hide resolved
@TomBursch
Copy link
Owner

Yes, I think I get what you're trying to do. I've added some comments on what I would change. Let me know/ping me when you want another review.

@irishrain
Copy link
Contributor Author

Before working on fixing the implementation, I thought it would be helpful to properly document what I want to implement, so we can separate design and implementation discussions. I took the sequence diagrams of the linked documentation for refresh tokens, and extended them with my proposal:

First the normal case without an attacker and any network issues:

sequenceDiagram
    participant LC as Legitimate Client
    participant A as Auth0
    
    Note over LC,A: Initial state: Client has RT1
    LC->>A: Exchange RT1 for new tokens
    Note over A: Generate AT2, RT2
    Note over A: Keep RT1 valid
    A-->>LC: Return AT2, RT2
    LC->>A: Use AT2 for API call
    Note over A: First use of new access token
    Note over A: Now invalidate RT1
    Note over LC,A: Later: AT2 expires
    LC->>A: Exchange RT2 for new tokens
    Note over A: Generate AT3, RT3
    Note over A: Keep RT2 valid
    A-->>LC: Return AT3, RT3
    LC->>A: Use AT3 for API call
    Note over A: First use of new access token
    Note over A: Now invalidate RT2
Loading

Now with a network issue:

sequenceDiagram
    participant LC as Legitimate Client
    participant A as Auth0
    
    Note over LC,A: Initial state: Client has RT1
    LC->>A: Exchange RT1 for new tokens
    Note over A: Generate AT2, RT2
    Note over A: Keep RT1 valid
    A-->>LC: Return AT2, RT2 (network error)
    Note over LC: Never receives tokens
    LC->>A: Retry: Exchange RT1 for new tokens
    Note over A: RT1 being reused - indicates<br/>possible network issues
    Note over A: Invalidate AT2, RT2<br/>since RT1 is being reused
    Note over A: Generate AT3, RT3
    A-->>LC: Return AT3, RT3
    LC->>A: Use AT3 for API call
    Note over A: First use of new access token
    Note over A: Now invalidate RT1
Loading

Now with an attack:

sequenceDiagram
    participant MC as Malicious Client
    participant LC as Legitimate Client
    participant A as Auth0
    
    Note over LC,A: Initial state: Both have RT1
    LC->>A: Exchange RT1 for new tokens
    Note over A: Generate AT2, RT2
    Note over A: Keep RT1 valid
    A-->>LC: Return AT2, RT2
    LC->>A: Use AT2 for API call
    Note over A: First use of new access token
    Note over A: Now invalidate RT1
    MC->>A: Try to use RT1
    Note over A: RT1 used after invalidation<br/>indicates compromise
    Note over A: Invalidate entire token family<br/>(RT1, AT2, RT2)
    A-->>MC: Access Denied
    LC->>A: Try to use RT2
    A-->>LC: Access Denied
    Note over LC: Must re-authenticate
Loading

The case where the attacker acts first:

sequenceDiagram
    participant MC as Malicious Client
    participant LC as Legitimate Client
    participant A as Auth0
    
    Note over MC,A: Initial state: Both have RT1
    MC->>A: Exchange RT1 for new tokens
    Note over A: Generate AT2, RT2
    Note over A: Keep RT1 valid
    A-->>MC: Return AT2, RT2
    MC->>A: Use AT2 for API call
    Note over A: First use of new access token
    Note over A: Now invalidate RT1
    LC->>A: Try to use RT1
    Note over A: RT1 used after invalidation<br/>indicates compromise
    Note over A: Invalidate entire token family<br/>(RT1, AT2, RT2)
    A-->>LC: Access Denied
    MC->>A: Try to use RT2
    A-->>MC: Access Denied
    Note over LC, MC: Both must re-authenticate
Loading

I tried thinking of race conditions and how to handle them, this is what I came up with:

sequenceDiagram
   participant LC as Legitimate Client
   participant A as Auth0
   participant MC as Malicious Client
   
   Note right of LC: Has RT1
   Note left of MC: Also has RT1
   LC->>A: Exchange RT1 for new tokens
   Note over A: Generate AT2, RT2
   Note over A: Keep RT1 valid
   A-->>LC: Return AT2, RT2 (network delay)
   MC->>A: Exchange RT1 for new tokens
   Note over A: RT1 still valid (no AT2 use yet)
   Note over A: Generate AT3, RT3
   Note over A: Invalidate AT2, RT2
   A-->>MC: Return AT3, RT3
   MC->>A: Use AT3 for API call
   Note over A: First use of new access token
   Note over A: Now invalidate RT1
   Note over LC: Finally receives AT2, RT2
   LC->>A: Try to use AT2
   Note over A: AT2 was already invalidated
   A-->>LC: Access Denied
   LC->>A: Try to use RT1
   Note over A: RT1 use after invalidation<br/>indicates compromise
   Note over A: Invalidate all tokens<br/>(RT1, AT3, RT3)
   A-->>LC: Access Denied
   Note right of LC: Must re-authenticate
   Note left of MC: Must re-authenticate
Loading

What do you think, should I try implementing it?

@TomBursch TomBursch added the bug Something isn't working label Dec 30, 2024
@TomBursch
Copy link
Owner

Your sequence diagrams look good.

In addition, if you look at the tree structure of tokens, it should be possible to create as many refresh tokens as possible (multiple network failures) and only once one subtree has been used no more refresh tokens can be created with the original refresh token and all other refresh token subtrees should be invalidated. Furthermore, access tokens don't need to be used.

The following structure would be valid and RT2 is the "current" refresh token:

stateDiagram-v2
    RT1 : RT1 (Used)
    AT1 : AT1 (Unused)
    RT2 : RT2 (Used)
    AT2 : AT2 (Used)
    RT3 : RT3 (Unused)
    AT3 : AT3 (Unused)
    RT4 : RT4 (Unused)
    AT4 : AT4 (Unused)
    RT5 : RT5 (Unused)
    AT5 : AT5 (Unused)

    RT1 --> AT1
    RT1 --> RT2
    RT2 --> AT2
    RT2 --> RT3
    RT3 --> AT3
    RT2 --> RT4
    RT4 --> AT4
    RT2 --> RT5
    RT5 --> AT5
Loading

@irishrain irishrain force-pushed the fix/rework_key_refresh branch from f628691 to c721b93 Compare December 31, 2024 00:40
@irishrain
Copy link
Contributor Author

irishrain commented Dec 31, 2024

Thanks for the additional input. I implemented my sequence diagrams and a sequence derived from your diagram as testcases, and then threw away my first approach (I certainly underestimated the complexity of this). Now my tests are passing, can you please look at my changes again? I think you can discard the previous conversations.

@irishrain irishrain force-pushed the fix/rework_key_refresh branch 4 times, most recently from 216c8f1 to 2d488eb Compare December 31, 2024 01:44
@irishrain
Copy link
Contributor Author

irishrain commented Dec 31, 2024

Sorry for all the force pushs, it took me a while to clean up the commit history. Also: Don't merge this just yet, I wrote some more test cases and discovered another bug. I'll ping you here when it's fixed.

@irishrain
Copy link
Contributor Author

Ok, one more testcase, one bug fixed. Please have a look, @TomBursch

@TomBursch
Copy link
Owner

First thanks again for all the work, I still would like to change some things (Sorry about that 😅). I want to keep the database requests as minimal as possible inside the check_if_token_revoked method, and I think we can optimize.

  • I think it isn't a bad idea to invalidate unused refresh tokens once a new refresh token has been requested from the parent
  • Keep the behavior of deleting access tokens inside the refresh method.

What do you think of this? This should allow us to move most of the logic from check_if_token_revoked to the create refresh token method.

My last diagram wasn't correct in that regard (even with the current implementation, the tokens shouldn't exist).

stateDiagram-v2
    RT1 : RT1 (Used)
    RT2 : RT2 (Used)
    RT3 : RT3 (Invalidated)
    RT4 : RT4 (Invalidated)
    RT5 : RT5 (Unused)
    AT5 : AT5 (Unused)

    RT1 --> RT2
    RT2 --> RT3
    RT2 --> RT4
    RT2 --> RT5
    RT5 --> AT5
Loading

@irishrain
Copy link
Contributor Author

irishrain commented Jan 2, 2025

I think it isn't a bad idea to invalidate unused refresh tokens once a new refresh token has been requested from the parent
Agreed, I silently already implemented it like this.

I want to keep the database requests as minimal as possible inside the check_if_token_revoked method, and I think we can optimize.

Minimizing database requests is a valid goal. I looked at check_if_token_revoked again. And you are right, half the changes are not needed at all, and one DB access is only needed if you want to ensure that not expired ATs are not accepted anymore, after the use of a new AT activates a new branch. I removed it as well. Let me know if that IS a security goal, then I can add that code again.

Copy link
Owner

@TomBursch TomBursch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you so much for working on this!

@TomBursch TomBursch force-pushed the fix/rework_key_refresh branch from 910a979 to 03978dd Compare January 3, 2025 10:06
@TomBursch TomBursch merged commit d10efa8 into TomBursch:main Jan 3, 2025
3 checks passed
@irishrain irishrain deleted the fix/rework_key_refresh branch January 4, 2025 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Logged out after coming back from offline mode
2 participants