-
Notifications
You must be signed in to change notification settings - Fork 18
feat: new CLI: airbyte-cdk image build
#504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…d for classes starting with "Test"
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Tested on a few Python connectors using each of the options and everything behaved as expected.
I did see a couple warnings in the build script, but the built connectors all ran as expected, just noting here in case it's worth investigation
2 warnings found (use docker --debug to expand):
- InvalidDefaultArgInFrom: Default value for ARG ${BASE_IMAGE} results in empty or invalid base image name (line 6)
- InvalidDefaultArgInFrom: Default value for ARG ${BASE_IMAGE} results in empty or invalid base image name (line 24)
Might want a keener pair of review eyes before merging, but I'll approve to unblock you!
@ChristoGrab - Thanks for reviewing! And thanks also for raising. FWIW, these warnings are expected. I may try to silence them in the future, but basically Docker is warning us that if we didn't provide a build arg for the base image, there wouldn't be an available default image otherwise. That's as expected, because the base image is required and we don't want to inadvertently failover to a different image than the one the connector expects.
|
📝 WalkthroughWalkthroughThis update introduces new functionality for managing Airbyte connector Docker images via the command line. It adds a CLI command group for building connector images, implements utilities for building and verifying multi-architecture Docker images, and provides Dockerfile templates for different connector types. The changes also include new Pydantic models for representing and validating connector metadata loaded from YAML files. Existing files were updated to include module-level documentation and copyright year changes. No existing public APIs were removed or changed. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant DockerUtils
participant MetadataModel
participant Docker
User->>CLI: airbyte-cdk image build [options]
CLI->>DockerUtils: verify_docker_installation()
DockerUtils->>Docker: docker version
Docker-->>DockerUtils: Success/Failure
CLI->>MetadataModel: MetadataFile.from_file(metadata.yaml)
MetadataModel-->>CLI: MetadataFile instance
CLI->>DockerUtils: build_connector_image(connector_name, dir, metadata, tag, ...)
DockerUtils->>Docker: docker build (for ARM64/AMD64)
Docker-->>DockerUtils: Build result
DockerUtils->>Docker: docker tag
DockerUtils->>Docker: docker run (spec command for verification)
Docker-->>DockerUtils: Verification result
DockerUtils-->>CLI: Build and verification status
CLI-->>User: Output build/verification results
Suggested labels
Suggested reviewers
Would you like to add more detailed usage examples to the CLI docstrings for easier onboarding, wdyt? 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (9)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (6)
airbyte_cdk/models/connector_metadata.py (2)
92-95
: String interpolation in the error message is broken – use an f-string?
"{file_path!s}"
will be printed literally. Turning the string into an f-string fixes it:- raise ValueError( - "Invalid metadata format: missing 'data' field in YAML file '{file_path!s}'" - ) + raise ValueError( + f"Invalid metadata format: missing 'data' field in YAML file '{file_path!s}'" + )
23-36
: Consider snake_case aliases for Pydantic fields?The keys come from YAML in camelCase (
baseImage
,path
) but Python callers might expect snake_case attributes.
Pydantic lets you declarealias="baseImage"
while naming the attributebase_image
, improving IDE autocompletion.
Would introducing aliases here make the model friendlier to Python users, or do you prefer to keep the raw names, wdyt?airbyte_cdk/utils/docker.py (1)
162-165
: Trimming registry/repository prefixes may create invalid tags – necessary?For fully-qualified images such as
ghcr.io/airbyte/source-s3
, the current logic keeps only the last path component, losing the registry and org (ghcr.io/airbyte
).
Is this intentional, or could we remove the slice altogether to preserve the original tag?airbyte_cdk/cli/airbyte_cdk/_image.py (2)
55-59
: Would enhancing the Docker verification be helpful?The code checks if Docker is installed but doesn't explicitly verify if the Docker daemon is running. Consider adding a daemon check to provide more specific error messages to users, wdyt?
def verify_docker_installation() -> bool: """Verify Docker is installed and running.""" try: run_docker_command(["docker", "--version"]) + # Check if docker daemon is running + run_docker_command(["docker", "info"]) return True except (subprocess.CalledProcessError, FileNotFoundError): return False
29-48
: Would a verbose mode be helpful for developers?For debugging purposes, maybe add a
--verbose
flag to provide more detailed output during the build process? This could help users diagnose issues when builds fail.@click.option("--tag", default="dev", help="Tag to apply to the built image (default: dev)") @click.option("--no-verify", is_flag=True, help="Skip verification of the built image") +@click.option("--verbose", is_flag=True, help="Enable verbose output") def build( connector_name: str | None = None, connector_directory: Path | None = None, *, tag: str = "dev", no_verify: bool = False, + verbose: bool = False, ) -> None:airbyte_cdk/utils/docker_image_templates.py (1)
73-78
: Would enhanced shell error handling be beneficial?The entrypoint script uses
set -e
which is good for basic error handling. Consider addingset -o pipefail
as well to ensure pipe failures are properly caught, wdyt?COPY --chmod=755 <<EOT /entrypoint.sh #!/usr/bin/env bash -set -e +set -eo pipefail ${CONNECTOR_KEBAB_NAME} "\$\@" EOT
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/cli/__init__.py
(1 hunks)airbyte_cdk/cli/airbyte_cdk/_image.py
(1 hunks)airbyte_cdk/models/connector_metadata.py
(1 hunks)airbyte_cdk/utils/docker.py
(1 hunks)airbyte_cdk/utils/docker_image_templates.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/cli/airbyte_cdk/_image.py (3)
airbyte_cdk/cli/airbyte_cdk/_util.py (1)
resolve_connector_name_and_directory
(8-43)airbyte_cdk/models/connector_metadata.py (2)
MetadataFile
(73-98)from_file
(81-98)airbyte_cdk/utils/docker.py (2)
build_connector_image
(104-189)verify_docker_installation
(253-259)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (7)
airbyte_cdk/cli/__init__.py (1)
1-9
: Docstring addition looks good – thanks for the clarity!airbyte_cdk/cli/airbyte_cdk/_image.py (1)
71-77
: The image building implementation looks solidGood job passing all the necessary parameters to the
build_connector_image
function! The approach of building multi-architecture images by default is a nice touch for developer experience.airbyte_cdk/utils/docker_image_templates.py (5)
14-33
: Is there a typo in the .dockerignore template?Line 17 has a commented-out entry
# "*,"
which seems unusual. Is this a typo or was it intended to be# "*",
(with the comma outside the quotes)? Or perhaps it's a placeholder for future use?
54-58
: Nice handling of optional build scripts!I like the conditional script execution for EXTRA_PREREQS_SCRIPT. This allows for custom build steps without modifying the template itself. Well thought out!
39-82
: Multi-stage build implementation is excellentThe multi-stage build approach for Python connectors is well-implemented and follows Docker best practices. Using a builder stage keeps the final image size smaller by excluding build tools. Good job!
88-101
: Clean implementation for manifest-only connectorsThe manifest-only template is appropriately simpler than the Python version. It correctly handles the different requirements for this connector type without unnecessary complexity.
60-63
:❓ Verification inconclusive
Regarding the TODO comment about uv pre-installation
The TODO about pre-installing uv on base images is a good optimization idea. Is there a timeline or ticket for this enhancement? Pre-installing would definitely speed up builds.
🏁 Script executed:
#!/bin/bash # Check if uv is significantly faster than pip for this use case time docker run --rm python:3.9-slim bash -c "pip install --no-cache-dir uv && python -m uv pip install --no-cache-dir requests" time docker run --rm python:3.9-slim bash -c "pip install --no-cache-dir requests"Length of output: 483
Assess uv Pre-installation in Base Images
I tried benchmarking
uv
vs.pip install
via a Docker run in this sandbox, butdocker
isn’t available here—so I couldn’t confirm the speedup. Could you run a quick timing test in your local environment (e.g., compare build layer times with and withoutuv
pre-installed) to verify we’re getting a meaningful improvement?Also, do we have—or should we open—a tracking ticket or timeline for bundling
uv
into our base images? wdyt?• File:
airbyte_cdk/utils/docker_image_templates.py
(lines 60–63)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
airbyte_cdk/utils/docker.py (2)
222-248
:check
parameter still ignored – pass it through tosubprocess.run
, maybe?The signature exposes
check: bool
, yet the call hard-codescheck=True
(line 245). Forwarding the argument preserves caller intent:- check=True, + check=check,Would you like to adopt this tiny tweak, wdyt?
204-213
: Enum vs. string comparison – previous concern persists
metadata.data.language
is an enum (ConnectorLanguage
), so string literals ("python"
,"manifest-only"
, …) will never match.
Shall we compare against the enum values instead, or cast tostr
once in the model? This prevents the Unsupported connector language error.
Happy to draft the diff if useful, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/models/connector_metadata.py
(1 hunks)airbyte_cdk/utils/docker.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- airbyte_cdk/models/connector_metadata.py
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
FYI - Temporarily reverting to draft as I address the suggestions from Coderabbit. Then I'll retest and revert to ready. |
Feedback addressed and I've retested successfully. |
This PR adds the CLI
airbyte-cdk image build
, which works as of now on Python-based and Manifest-Only connectors.From any Python or Manifest-Only directory, you can now run
airbyte-cdk image build
(with no args) to build both the ARM and AMD versions of the docker image.Targets:
airbyte-cdk
CLI with support for:connector test
,secrets fetch
and--version
#493This PR allows you to quickly build a docker image for your connector. This is a follow-on to the previous PR that added
connector test
andsecrets fetch
capabilities. It uses most of the same practices as the officialairbyte-ci
pipeline, but it is faster and has some modest improvements.See the base PR for more context on the CLI in general:
airbyte-cdk
CLI with support for:connector test
,secrets fetch
and--version
#493Other notes:
sources-s3
you'll have 3 images:airbyte/source-s3:dev-amd64
airbyte/source-s3:dev-arm64
airbyte/source-s3:dev
(same as the ARM64 one, assuming for now that local dev is happening on an M-series Mac).uv
instead ofpip
to do the install. This speeds up the builds significantly.airbyte-cdk
CLI commands, you can run this statelessly usinguvx airbyce-cdk image build
. This doesn't require pre-installing the CDK, and will use the latest CDK version by default.airbyte-ci
for build+publish. We can discuss this, but currently, this functionality is only for dev/test workflows.https://www.loom.com/share/5ca23374a5e24af1a4163597e3048154?sid=47e4cfa3-3a03-4eea-beb3-9a9dca3cb0ea
Replaces:
image build
command toairbyte-cdk
CLI #489Summary by CodeRabbit
New Features
.dockerignore
template for Python and manifest-only connectors.Documentation
Chores