- 
                Notifications
    You must be signed in to change notification settings 
- Fork 30
fix: properly set token_expiry_is_time_of_expiration and mask access token when logging #637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…TokenOauth2Authenticator constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes two OAuth implementation issues: ensuring the token_expiry_is_time_of_expiration flag is set for single-use token authenticators and masking access tokens during logging.
- Set token_expiry_is_time_of_expirationbased on presence oftoken_expiry_date_formatin the model factory.
- Extract and mask the access token before logging responses in the OAuth request handler.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description | 
|---|---|
| airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py | Cache response.json(), extract the access token to calladd_to_secrets, then log and return the stored JSON. | 
| airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py | Pass token_expiry_is_time_of_expiration=bool(model.token_expiry_date_format)when constructing the single-use refresh token authenticator. | 
Comments suppressed due to low confidence (3)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py:222
- [nitpick] The variable name access_keyis ambiguous; consider renaming it toaccess_tokenfor clearer intent.
            access_key = self._extract_access_token(response_json)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py:220
- Add a unit test to verify that access tokens are properly extracted and masked by add_to_secretsbefore logging.
            response_json = response.json()
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:2804
- Add tests to confirm that token_expiry_is_time_of_expirationis correctly set whentoken_expiry_date_formatis present or absent.
                token_expiry_is_time_of_expiration=bool(model.token_expiry_date_format),
| 📝 WalkthroughWalkthroughThis change updates the OAuth authenticator creation process in the declarative source component factory by introducing a parameter that determines if the token expiry should be treated as the time of expiration. It also modifies the request handling in the abstract OAuth class to parse and mask access tokens earlier in the flow and adjust token handling logic. Changes
 Sequence Diagram(s)sequenceDiagram
    participant Client
    participant ModelToComponentFactory
    participant DeclarativeOauth2Authenticator
    participant DeclarativeSingleUseRefreshTokenOauth2Authenticator
    Client->>ModelToComponentFactory: create_oauth_authenticator(model)
    alt model.refresh_token_updater present
        ModelToComponentFactory->>DeclarativeSingleUseRefreshTokenOauth2Authenticator: __init__(..., token_expiry_is_time_of_expiration)
    else
        ModelToComponentFactory->>DeclarativeOauth2Authenticator: __init__(..., token_expiry_is_time_of_expiration)
    end
sequenceDiagram
    participant OAuthAuthenticator
    participant TokenEndpoint
    OAuthAuthenticator->>TokenEndpoint: POST /token (refresh request)
    TokenEndpoint-->>OAuthAuthenticator: JSON response (includes access_token)
    OAuthAuthenticator->>OAuthAuthenticator: Parse JSON, mask access_token, log response
    OAuthAuthenticator->>OAuthAuthenticator: Return parsed JSON
Possibly related PRs
 Suggested labels
 Would you like me to help draft a quick note for the reviewers or suggest any additional tests to cover these changes, wdyt? 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
 🧰 Additional context used🧠 Learnings (1)📓 Common learnings🧬 Code Graph Analysis (1)airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
 🪛 Flake8 (7.2.0)airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py[error] 233-233: local variable 'e' is assigned to but never used (F841) ⏰ Context from checks skipped due to timeout of 90000ms (9)
 🔇 Additional comments (3)
 ✨ Finishing Touches
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit: 
 
 SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
 Other keywords and placeholders
 CodeRabbit Configuration File ( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
220-228: Great security improvement! The token masking logic looks solid.I like how you've restructured this to parse the JSON once and mask the access token before any logging occurs. This definitely addresses the security concern from the PR objectives.
One small consideration - what happens if
_extract_access_tokenfails on line 222? Should we wrap it in a try-catch to ensure the logging still happens even if token extraction fails, wdyt? The existing flow in_ensure_access_token_in_responsealready handles this case, but it might be worth being defensive here too.The efficiency gain from parsing JSON once is also a nice bonus!
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2826-2829: Consistent derivation vs user override?We now always derive
token_expiry_is_time_of_expirationfrom the presence oftoken_expiry_date_format. Should we let users explicitly override this behaviour (e.g. via a manifest flag) rather than forcingbool(format)? Maybe accept an optional field and fall back to the derived value if unspecified, to avoid surprising edge-cases—wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(1 hunks)
- airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
undefined
<retrieved_learning>
Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py file, the strict module name checks in _get_class_from_fully_qualified_class_name (requiring module_name to be "components" and module_name_full to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
</retrieved_learning>
<retrieved_learning>
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
</retrieved_learning>
<retrieved_learning>
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/, including _run.py, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
</retrieved_learning>
<retrieved_learning>
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the declarative_component_schema.py file is auto-generated from declarative_component_schema.yaml and should be ignored in the recommended reviewing order.
</retrieved_learning>
🧬 Code Graph Analysis (2)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
add_to_secrets(67-70)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2)
token_expiry_is_time_of_expiration(68-73)
token_expiry_date_format(76-81)airbyte_cdk/sources/streams/http/requests_native_auth/oauth.py (2)
token_expiry_is_time_of_expiration(127-128)
token_expiry_date_format(131-132)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2803-2805: Constructor signature mismatch risk fortoken_expiry_is_time_of_expirationNice catch adding the flag here! Could you double-check that
DeclarativeSingleUseRefreshTokenOauth2Authenticator.__init__()indeed exposes atoken_expiry_is_time_of_expirationkeyword? If not, this call will raise aTypeErrorat runtime. A quick grep shows the getter property, but not the constructor arg—wdyt about confirming or extending the class accordingly?
        
          
                airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving with one non-blocking comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
252-268: Smart cleanup to avoid duplicate secret masking!This change aligns perfectly with the past review feedback about removing the duplicate
add_to_secrets()call. Now the responsibility is clearly separated - this method only validates the token exists, while the masking happens earlier in_make_handled_request.
🧹 Nitpick comments (1)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
220-232: Great security improvement! A couple of minor cleanup suggestions - wdyt?The logic to extract and mask the access token before logging is exactly what we needed to fix the security issue. I love how you're parsing the JSON once and reusing it for efficiency.
Two small nitpicks from the static analysis:
- The variable
ein the exception handler (line 226) is unused- The comment formatting on line 227 has too many
#characters- except ResponseKeysMaxRecurtionReached as e: - ## Could not find the access token in the response, so do nothing + except ResponseKeysMaxRecurtionReached: + # Could not find the access token in the response, so do nothing
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
- airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
add_to_secrets(67-70)
🪛 Flake8 (7.2.0)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py
[error] 226-226: local variable 'e' is assigned to but never used
(F841)
[error] 227-227: too many leading '#' for block comment
(E266)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (python)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
- airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
add_to_secrets(67-70)
🪛 Flake8 (7.2.0)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py
[error] 233-233: local variable 'e' is assigned to but never used
(F841)
[error] 234-234: too many leading '#' for block comment
(E266)
🪛 Pylint (3.3.7)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py
[error] 241-241: Using variable 'json_exception' before assignment
(E0601)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py
[error] 1-1: Ruff formatting check failed. File would be reformatted. Run 'ruff format --fix' to fix code style issues.
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-amplitude
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (3)
241-244: The JSON exception handling logic looks good!The approach of caching the parsed JSON and raising exceptions later is a clean solution that ensures the response is logged regardless of JSON parsing success. This maintains the debugging capability while handling errors appropriately.
264-264: Minor comment update - looks fine!The comment change is just a grammatical improvement and doesn't affect functionality.
227-235: No duplicateadd_to_secrets()calls found in OAuth codeI ran a global search and the only invocation of
add_to_secretsin the OAuth classes is at line 232 ofabstract_oauth.py. There’s no redundant masking elsewhere—looks safe to leave as is, wdyt?
| @dbgold17 I had to make a few more changes here to properly handle json parsing exceptions and cases where there is no access token in the response. Mind taking one more look? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2)
220-240: Consider refactoring to eliminate duplicate logging and address static analysis issues, wdyt?A few observations about the current implementation:
The response is being logged twice for successful responses (lines 223 and 237) - do we need both log calls?
The variable
ein the exception handler on line 233 is captured but never used, which triggers the static analysis warning.
From the past review discussion, there's still a question about whether silently ignoring
ResponseKeysMaxRecurtionReachedis the right approach - if we hit max recursion depth but haven't fully explored the response JSON, we might miss the access token and fail to mask it.How about consolidating the logic like this to address these concerns:
- if not response.ok: - # log the response even if the request failed for troubleshooting purposes - self._log_response(response) - response.raise_for_status() - - response_json = response.json() - - try: - # extract the access token and add to secrets to avoid logging the raw value - access_key = self._extract_access_token(response_json) - if access_key: - add_to_secrets(access_key) - except ResponseKeysMaxRecurtionReached as e: - # could not find the access token in the response, so do nothing - pass - - self._log_response(response) - - return response_json + if not response.ok: + # log the response even if the request failed for troubleshooting purposes + self._log_response(response) + response.raise_for_status() + + response_json = response.json() + + try: + # extract the access token and add to secrets to avoid logging the raw value + access_key = self._extract_access_token(response_json) + if access_key: + add_to_secrets(access_key) + except ResponseKeysMaxRecurtionReached: + # could not find the access token in the response, so do nothing + pass + + self._log_response(response) + return response_jsonThis removes the unused variable and consolidates the logging. What do you think about the exception handling approach though?
</review_comment_end>
221-237: Address the ruff formatting issue flagged by the pipeline, wdyt?The pipeline is failing due to formatting issues. Could you run
ruff formaton this file to fix the code style issues?</review_comment_end>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
- airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
add_to_secrets(67-70)
🪛 Flake8 (7.2.0)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py
[error] 233-233: local variable 'e' is assigned to but never used
(F841)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py
[error] 1-1: ruff formatting check failed. File would be reformatted. Run 'ruff format' to fix code style issues.
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
What
When testing out some Builder changes relating to OAuth, I noticed two issues with the implementation of OAuth in the CDK:
Invalid expires_in value: 2025-08-01T21:34:33Z. Expected number of seconds when no format specified.****in the OAuth response shown in the Builder - I could see the raw valueHow
I traced the first issue down to the fact that
token_expiry_is_time_of_expirationis not being set when constructingDeclarativeSingleUseRefreshTokenOauth2Authenticatorhere.This caused this if statement to always return False, causing the error to be thrown.
To fix this, I simply set the
token_expiry_is_time_of_expirationthe same way it is being set when constructingDeclarativeOauth2AuthenticatorbelowThe second issue I traced down to the fact that the OAuth response was being logged before anything had a chance to add the access token to the secrets list. The fix was to extract the access token and add it to the secrets mask list before logging the response
Summary by CodeRabbit