-
Notifications
You must be signed in to change notification settings - Fork 25
feat(cli): use new central Dockerfile
locations, add capability to locate airbyte_repo_root
, use multi-purpose CONNECTOR
positional arg in CLI commands
#530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… airbyte-repo-root
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR centralizes Dockerfile templates and enhances CLI functionality by introducing a repository-root resolution mechanism and adopting a more concise positional connector argument across commands. Key changes include:
- Removing legacy Dockerfile templates in favor of centralized definitions and downloading logic.
- Adding a new module for resolving Airbyte repository and connector paths.
- Updating CLI commands and tests to use positional "[CONNECTOR]" arguments instead of separate flags.
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py | Updated import for MANIFEST_YAML to use the new connector paths module. |
unit_tests/cli/airbyte_cdk/test_util.py | Added tests for the new resolve_airbyte_repo_root functionality. |
airbyte_cdk/utils/docker_image_templates.py | Removed legacy Dockerfile templates to centralize definitions. |
airbyte_cdk/utils/docker.py | Modified Docker image build functions to use updated template and repo resolution logic. |
airbyte_cdk/utils/connector_paths.py | New module for resolving Airbyte repo roots and connector paths. |
airbyte_cdk/test/standard_tests/util.py, declarative_sources.py, connector_base.py | Updated imports from test_resources to connector_paths. |
airbyte_cdk/cli/airbyte_cdk/_secrets.py, _image.py, _connector.py | Updated CLI argument handling and removed legacy utility files. |
Comments suppressed due to low confidence (1)
airbyte_cdk/utils/connector_paths.py:140
- The variable 'connector' is not defined in this context; it appears that 'connector_ref' was intended. Updating the error message to reference the correct variable would resolve this issue.
raise ValueError(f"Could not infer connector_name or connector_directory from input ref: {connector}")
Warning Rate limit exceeded@aaronsteers has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 0 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis change refactors the Airbyte CDK CLI and related utilities by consolidating the logic for resolving connector names and directories into a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI_Command
participant ConnectorPathsUtil
participant DockerUtil
User->>CLI_Command: Invoke CLI with [connector] argument
CLI_Command->>ConnectorPathsUtil: resolve_connector_name_and_directory(connector)
ConnectorPathsUtil-->>CLI_Command: (connector_name, connector_directory)
CLI_Command->>DockerUtil: build_connector_image(connector_directory, dockerfile_override)
DockerUtil->>ConnectorPathsUtil: resolve_airbyte_repo_root(connector_directory)
DockerUtil->>DockerUtil: get_dockerfile_templates() or download from GitHub
DockerUtil-->>CLI_Command: Build image
CLI_Command-->>User: Output result
Possibly related PRs
Suggested reviewers
Would you like to add more test cases for the new path resolution logic, or is the current coverage sufficient for now—wdyt? ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (11)
airbyte_cdk/utils/docker.py (2)
267-309
: Consider adding timeout and retry logic to GitHub requests, wdyt?The HTTP requests to GitHub don't include timeout or retry logic, which could potentially make the build process less reliable in case of network issues.
- dockerfile_resp = requests.get(dockerfile_url) - dockerfile_resp.raise_for_status() + # Add timeout and retry for better reliability + session = requests.Session() + adapter = requests.adapters.HTTPAdapter(max_retries=3) + session.mount('https://', adapter) + dockerfile_resp = session.get(dockerfile_url, timeout=10) + dockerfile_resp.raise_for_status()Also consider adding a user agent header to follow good API etiquette:
- dockerfile_resp = requests.get(dockerfile_url) + headers = {"User-Agent": "Airbyte-CDK/1.0"} + dockerfile_resp = requests.get(dockerfile_url, headers=headers)
296-296
: Consider parameterizing the GitHub branch referenceThe hardcoded reference to the
master
branch might cause issues if the repository structure changes or if specific versions need to be targeted.- base_url = f"https://github.com/airbytehq/airbyte/raw/master/docker-images/" + # Allow specifying a different branch or tag + github_branch = os.environ.get("AIRBYTE_GITHUB_BRANCH", "master") + base_url = f"https://github.com/airbytehq/airbyte/raw/{github_branch}/docker-images/"airbyte_cdk/cli/airbyte_cdk/_connector.py (1)
49-53
: Good consolidation of imports.The import statements have been nicely reorganized to use the new centralized location for connector path utilities. I noticed you commented out
pytest_hooks
- was that intentional? If it's no longer needed, maybe we should remove it completely? wdyt?unit_tests/cli/airbyte_cdk/test_util.py (2)
1-3
: Unused import in test file.The
check
import fromtabnanny
on line 2 doesn't appear to be used anywhere in this file. Should we remove it?from pathlib import Path -from tabnanny import check import pytest
61-84
: Consider expanding exception handling.The test only specifically handles
FileNotFoundError
, butresolve_airbyte_repo_root
might raise other exceptions. Would it make sense to add a more general exception handler to provide clearer test failures for unexpected errors? Maybe something like:try: repo_root = resolve_airbyte_repo_root(start_dir_rel) # existing validation code except FileNotFoundError: if expect_success: raise AssertionError(f"Airbyte repo root not found from '{start_dir_rel!s}'.") + except Exception as e: + if expect_success: + raise AssertionError(f"Unexpected error resolving repo root from '{start_dir_rel!s}': {e}") + # If we expected failure, any exception is fineWhat do you think?
airbyte_cdk/cli/airbyte_cdk/_image.py (1)
81-81
: Minor redundancy in parameter passing.There's a small redundancy in the parameter passing:
dockerfile_override=dockerfile or None,Since
dockerfile
is already defined asPath | None
, theor None
is unnecessary. Consider simplifying to:- dockerfile_override=dockerfile or None, + dockerfile_override=dockerfile,What do you think?
airbyte_cdk/cli/airbyte_cdk/_secrets.py (4)
76-81
: Use a richer Click type for paths?The positional
connector
argument is declared astype=str
, which works for both a plain connector name and a path expressed as a string.
Would it be useful to accept a real path object as well, by switching totype=click.Path(path_type=Path, exists=False)
and then passing it through unchanged?
That would give users tab-completion for directories and automatically normalisePath
handling insideresolve_connector_name_and_directory
, while still allowing simple names. Wdyt?
115-117
: Minor security wording tweak?The warning now says the option is ignored without
CI
, but the implementation below prints an explanatory message after the flag is supplied.
Perhaps “will be ignored and a notice printed” would match the runtime behaviour more precisely. Wdyt?
121-123
: Surface resolution errors more prominently
resolve_connector_name_and_directory()
can raiseValueError
/FileNotFoundError
.
Would wrapping the call in a tinytry/except
that converts these intoclick.ClickException
make the CLI error output friendlier (no stack trace by default)? For example:- connector_name, connector_directory = resolve_connector_name_and_directory(connector) + try: + connector_name, connector_directory = resolve_connector_name_and_directory(connector) + except (ValueError, FileNotFoundError) as exc: + raise click.ClickException(str(exc)) from excThis keeps internal exceptions out of user-facing terminals. Wdyt?
287-289
: Rename shadow variable to underscoreUsing
_ = connector_name
avoids the “unused” linter complaint, but it still reads as a placeholder.
Would dropping the parameter entirely (or renaming the function to_get_secrets_dir_for_connector()
with noconnector_name
arg) simplify the signature?airbyte_cdk/utils/connector_paths.py (1)
168-170
: Consider accepting “speculative” prefixesThe name check currently hard-fails when the directory name does not start with
source-
ordestination-
.
Would accepting custom prefixes (e.g.,processor-
, future categories) or at least providing a clearer hint (“expected 'source-' or 'destination-' prefix”) improve UX? Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
airbyte_cdk/cli/airbyte_cdk/_connector.py
(4 hunks)airbyte_cdk/cli/airbyte_cdk/_image.py
(3 hunks)airbyte_cdk/cli/airbyte_cdk/_secrets.py
(5 hunks)airbyte_cdk/cli/airbyte_cdk/_util.py
(0 hunks)airbyte_cdk/test/standard_tests/connector_base.py
(1 hunks)airbyte_cdk/test/standard_tests/declarative_sources.py
(1 hunks)airbyte_cdk/test/standard_tests/test_resources.py
(0 hunks)airbyte_cdk/test/standard_tests/util.py
(1 hunks)airbyte_cdk/utils/connector_paths.py
(1 hunks)airbyte_cdk/utils/docker.py
(5 hunks)airbyte_cdk/utils/docker_image_templates.py
(0 hunks)unit_tests/cli/airbyte_cdk/test_util.py
(1 hunks)unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py
(1 hunks)
💤 Files with no reviewable changes (3)
- airbyte_cdk/cli/airbyte_cdk/_util.py
- airbyte_cdk/utils/docker_image_templates.py
- airbyte_cdk/test/standard_tests/test_resources.py
🧰 Additional context used
🧠 Learnings (1)
airbyte_cdk/test/standard_tests/declarative_sources.py (1)
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
🧬 Code Graph Analysis (2)
airbyte_cdk/cli/airbyte_cdk/_image.py (2)
airbyte_cdk/utils/connector_paths.py (1)
resolve_connector_name_and_directory
(81-145)airbyte_cdk/utils/docker.py (1)
verify_docker_installation
(394-400)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
airbyte_cdk/utils/connector_paths.py (1)
resolve_connector_name_and_directory
(81-145)
🪛 GitHub Actions: Linters
airbyte_cdk/utils/connector_paths.py
[error] 142-142: mypy error: Name "connector" is not defined [name-defined]
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
🔇 Additional comments (21)
airbyte_cdk/utils/docker.py (5)
15-15
: New dependency addedAdded
requests
library for downloading Dockerfile templates from GitHub. This allows fallback template fetching when local templates aren't available.
144-144
: Added flexibility with dockerfile_override parameter, nice!This allows users to specify a custom Dockerfile path, which is a good addition for flexibility. This matches the PR objective of making the CLI more intuitive and versatile.
168-196
: Enhanced Dockerfile resolution with multiple fallback strategiesThe implementation handles three scenarios gracefully:
- Custom Dockerfile specified via the new parameter
- Local templates from the Airbyte repository
- Downloaded templates from GitHub as a fallback
This robust approach ensures the build process works in various environments, including when the Airbyte repo isn't checked out locally.
213-213
: Simplified build argsThe code now only uses the kebab-case connector name instead of both formats, simplifying the build arguments.
312-360
: Good implementation of template resolutionThis function correctly uses the new
resolve_airbyte_repo_root
function to find Docker templates from the local repository. The error handling is thorough and the validation of file existence is comprehensive.airbyte_cdk/test/standard_tests/declarative_sources.py (1)
15-15
: Updated import to use centralized connector paths moduleThe import was updated to use the new
airbyte_cdk.utils.connector_paths
module, which aligns with the broader refactoring in this PR. This is consistent with the objective of centralizing repository path resolution.airbyte_cdk/test/standard_tests/connector_base.py (1)
27-30
: Updated imports to use centralized connector paths moduleThis change follows the same pattern of centralizing connector path and repository resolution logic into the new
airbyte_cdk.utils.connector_paths
module. The consistent import changes across multiple files show good attention to detail.airbyte_cdk/test/standard_tests/util.py (1)
15-18
: Updated imports to use centralized connector paths moduleThe change maintains consistency with other files by importing from the new centralized module. This change contributes to the PR objective of making repository root finding more robust.
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (1)
35-35
: Import path change looks good.The import of
MANIFEST_YAML
has been updated to use the new centralized moduleairbyte_cdk.utils.connector_paths
instead of the previous location. This aligns well with the PR objective of consolidating connector path resolution logic.airbyte_cdk/cli/airbyte_cdk/_connector.py (4)
104-109
: CLI interface simplification looks great!Changing from two separate options to a single positional argument makes the CLI more intuitive and aligns perfectly with the PR objectives.
117-117
: Function signature updated correctly.The parameter now accepts either a string or Path object, matching the CLI interface changes.
124-127
: Clear and helpful documentation.The updated docstring clearly explains the new behavior, making it easy for users to understand how the command works.
137-137
: Implementation update matches new approach.The call to
resolve_connector_name_and_directory
has been correctly updated to use the simplified parameter.unit_tests/cli/airbyte_cdk/test_util.py (2)
11-25
: Helper function looks good.The
check_local_repo_structure
helper function effectively validates the existence of the expected repository layout. This will help ensure the tests only run in the appropriate environment.
27-47
: Well-structured test parametrization.The test cases cover a good variety of scenarios for repository root resolution, including both expected successes and failures. The skip condition is also well implemented to avoid running tests in environments without the required directory structure.
airbyte_cdk/cli/airbyte_cdk/_image.py (5)
14-14
: Import path update looks good.The import has been correctly updated to use the new centralized location for connector path utilities.
31-36
: CLI interface simplification is consistent.Using a single positional argument instead of two separate options is consistent with the changes in
_connector.py
and makes the CLI more intuitive.
39-43
: New dockerfile override option is useful.The addition of the
--dockerfile
option provides flexibility for users who need to customize the build process, which is a nice enhancement.
53-54
: Clear documentation of the new parameter behavior.The docstring clearly explains how the
CONNECTOR
argument works, making it easy for users to understand.
62-62
: Implementation updated correctly.The call to
resolve_connector_name_and_directory
has been correctly updated to use the simplified parameter.airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
105-107
: Docstring update keeps CLI behaviour crystal-clear👍 Thanks for expanding the docstring to reflect the new positional argument – this will save users a lot of confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
airbyte_cdk/utils/connector_paths.py (2)
44-45
: Should we use_is_airbyte_repo_root
for parent dir validation as well?Unlike the sibling and child directory checks that use
_is_airbyte_repo_root
, the parent directory check only verifies the directory name. This could lead to false positives if a directory happens to be named "airbyte" but doesn't have the expected structure.Wdyt about using the same validation here?
if parent_dir.name == "airbyte" or parent_dir.name == "airbyte-enterprise": return parent_dirif parent_dir.name == "airbyte" or parent_dir.name == "airbyte-enterprise": + if _is_airbyte_repo_root(parent_dir): return parent_dir
142-142
: Undefined variable in error message - should beconnector_ref
The variable
connector
is never defined in this scope. I believe you meant to useconnector_ref
here.- f"Could not infer connector_name or connector_directory from input ref: {connector}", + f"Could not infer connector_name or connector_directory from input ref: {connector_ref}",
🧹 Nitpick comments (2)
airbyte_cdk/utils/connector_paths.py (2)
136-140
: Potential redundancy in connector directory resolution logicThe code flow has already established
connector_directory
earlier, but then conditionally sets it again here. We might be able to simplify this logic.Wdyt about this approach?
- if connector_directory: - connector_directory = connector_directory.resolve().absolute() - elif connector_name: - connector_directory = find_connector_root_from_name(connector_name) - else: + if not connector_directory and connector_name: + connector_directory = find_connector_root_from_name(connector_name) + elif not connector_directory: raise ValueError( f"Could not infer connector_name or connector_directory from input ref: {connector_ref}", ) + connector_directory = connector_directory.resolve().absolute()
105-106
: Path detection could be more robust for edge casesDetecting paths by the presence of "/" or "\" works for most cases, but might not handle all valid paths correctly. For example, network paths or URIs.
Wdyt about using
Path.is_absolute()
or checking if the string can be converted to a valid path that exists?- if "/" in connector_ref or "\\" in connector_ref: + # Try to convert to Path and check if it exists + potential_path = Path(connector_ref) + if potential_path.exists() or "/" in connector_ref or "\\" in connector_ref: # If the connector name is a path, treat it as a directory connector_directory = Path(connector_ref)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/utils/connector_paths.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (4)
airbyte_cdk/utils/connector_paths.py (4)
1-10
: Good job on setting up the module structure and constants!The constants defined for marker files provide a clear reference point for identifying connector directories. The imports are minimal and appropriate for the functionality.
80-146
: Well-structured resolution logic for connector referencesThe function provides a flexible interface for resolving connector names and directories from various input types. The design allows users to specify either a name or path, with intelligent fallbacks.
174-193
: Well-implemented connector root finderThe function efficiently searches for marker files across multiple paths and their parent directories, with a clean stopping condition when reaching "airbyte_cdk". This will make connector discovery reliable.
195-221
: Good fallback logic for finding connectorsI like how this function first tries the direct approach, then falls back to locating via the Airbyte repo root with clear error messages at each step.
CONNECTOR
positional arg in CLI commands
CONNECTOR
positional arg in CLI commandsairbyte_repo_root
, use multi-purpose CONNECTOR
positional arg in CLI commands
airbyte_repo_root
, use multi-purpose CONNECTOR
positional arg in CLI commandsairbyte_repo_root
, use multi-purpose CONNECTOR
positional arg in CLI commands
airbyte_repo_root
, use multi-purpose CONNECTOR
positional arg in CLI commandsDockerfile
locations, add capability to locate airbyte_repo_root
, use multi-purpose CONNECTOR
positional arg in CLI commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, with a discussion question that we can always figure out later.
Did you consider having the Dockerfiles live within the CDK and imported into the airbyte
repo as needed instead of now having all of this logic to find the airbyte repo locally? The file path traversing and directory structure logic feels a bit magic / fragile, but is is also nice if it just works. Another alternative that I don't love is that we could have the cdk cli by default pull from Github unless the user specifies a path to the airbyte repo locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbgold17 - Thanks again for your review here. Responded inline to your questions and we can keep these discussions going after the merge.
Regarding your question about where to house the Dockerfiles, I feel more strongly that putting them in the main repo is at least a good path, and as discussed the past couple days, it allows all teams to have a sense of ownership in those build processes.
I think it's okay for the CDK CLI to get files from that repo - and I don't mind much if we get from local refs vs defaulting to downloading from github. The only reason I didn't default to downloading from github in this iteration is that I wanted our default mode to not need network access. So, preferring local files, then falling back to a remote download seemed like the best first implementation for now, at least.
Thanks again for all your feedback and the very thoughtful comments!
A follow-up to:
docker-images
directory airbyte#59193The above PR migrated the Dockerfile and Dockerignore definitions into the
docker-images
directory in theairbytehq/airbyte
repo. This PR updates CDK logic to use those newly migrated files during docker image builds. The Dockerfile defs were originally hosted here in the CDK but now that they are migrated, this PR also removes them from the CDK repo. If we cannot locate them in the local airbyte repo checked out, then we download the text file contents frommaster
branch of the repo on GitHub.In addition, this PR does a few other things:
airbyte-enterprise
repo.airbyte-python-cdk
andairbyte
are sibling directories.--connector-name
and--connector-directory
can be ommitted from the first CLI arg.poetry run airbyte-cdk image build source-s3
. The CDK CLI will find the airbyte repo and find the connector directory automatically.--connector-name
/--connector-directory
).image build
,secrets fetch
,secrets list
, andconnector test
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests
Chores