Skip to content

feat(cli): use new central Dockerfile locations, add capability to locate airbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commands #530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 7, 2025

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented May 2, 2025

A follow-up to:

The above PR migrated the Dockerfile and Dockerignore definitions into the docker-images directory in the airbytehq/airbyte repo. This PR updates CDK logic to use those newly migrated files during docker image builds. The Dockerfile defs were originally hosted here in the CDK but now that they are migrated, this PR also removes them from the CDK repo. If we cannot locate them in the local airbyte repo checked out, then we download the text file contents from master branch of the repo on GitHub.

In addition, this PR does a few other things:

  1. Add the ability to locate the airbyte repo root.
    • This works when cwd is anywhere inside the airbyte repo.
    • This also works when cwd is inside the airbyte-enterprise repo.
    • This also works in any repo checked out that is adjacent (sibling) to the airbyte repo. For instance, this works within the airbyte-python-cdk repo if airbyte-python-cdk and airbyte are sibling directories.
  2. Modify the CLI so that --connector-name and --connector-directory can be ommitted from the first CLI arg.
    • The CLI will infer if a connector name or directory is passed, based on the presence of a path delimiter (normally "/", or "" on windows).
    • As before, if no arg is set, we'll try to get connector dir and name from the current cdw. (Let's you run directly within the connector directory with no args.)
    • Now this works from the CDK repo: poetry run airbyte-cdk image build source-s3. The CDK CLI will find the airbyte repo and find the connector directory automatically.
  3. Modify the other CLI commands to use the same convention, using a positional argument instead of a flag (--connector-name / --connector-directory).
    • This applies to all CLI commands: image build, secrets fetch, secrets list, and connector test
    • Whether it is a name or a path will be inferred from the input.
    • This allows a more concise, intuitive, and uniform syntax.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Added support for specifying a connector by name or path as a single argument in CLI commands, simplifying usage.
    • Introduced an option to override the Dockerfile used when building connector images via the CLI.
    • Improved Docker image build process to dynamically retrieve Dockerfile templates from the local repository or GitHub if not found locally.
  • Refactor

    • Unified and streamlined connector resolution logic across CLI commands.
    • Centralized connector path utilities into a new module, improving maintainability.
    • Updated CLI help texts and argument structures for clarity and consistency.
  • Bug Fixes

    • Enhanced error handling for connector resolution and Dockerfile template retrieval.
  • Tests

    • Added new tests to validate repository root resolution logic.
  • Chores

    • Removed obsolete utility and template files, consolidating logic into shared modules.

@github-actions github-actions bot added the enhancement New feature or request label May 2, 2025
@aaronsteers aaronsteers marked this pull request as ready for review May 2, 2025 18:51
@Copilot Copilot AI review requested due to automatic review settings May 2, 2025 18:51
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR centralizes Dockerfile templates and enhances CLI functionality by introducing a repository-root resolution mechanism and adopting a more concise positional connector argument across commands. Key changes include:

  • Removing legacy Dockerfile templates in favor of centralized definitions and downloading logic.
  • Adding a new module for resolving Airbyte repository and connector paths.
  • Updating CLI commands and tests to use positional "[CONNECTOR]" arguments instead of separate flags.

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py Updated import for MANIFEST_YAML to use the new connector paths module.
unit_tests/cli/airbyte_cdk/test_util.py Added tests for the new resolve_airbyte_repo_root functionality.
airbyte_cdk/utils/docker_image_templates.py Removed legacy Dockerfile templates to centralize definitions.
airbyte_cdk/utils/docker.py Modified Docker image build functions to use updated template and repo resolution logic.
airbyte_cdk/utils/connector_paths.py New module for resolving Airbyte repo roots and connector paths.
airbyte_cdk/test/standard_tests/util.py, declarative_sources.py, connector_base.py Updated imports from test_resources to connector_paths.
airbyte_cdk/cli/airbyte_cdk/_secrets.py, _image.py, _connector.py Updated CLI argument handling and removed legacy utility files.
Comments suppressed due to low confidence (1)

airbyte_cdk/utils/connector_paths.py:140

  • The variable 'connector' is not defined in this context; it appears that 'connector_ref' was intended. Updating the error message to reference the correct variable would resolve this issue.
raise ValueError(f"Could not infer connector_name or connector_directory from input ref: {connector}")

Copy link
Contributor

coderabbitai bot commented May 2, 2025

Warning

Rate limit exceeded

@aaronsteers has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 0 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between c79c719 and cbed4d4.

📒 Files selected for processing (1)
  • airbyte_cdk/utils/connector_paths.py (1 hunks)
📝 Walkthrough

Walkthrough

This change refactors the Airbyte CDK CLI and related utilities by consolidating the logic for resolving connector names and directories into a new connector_paths.py module. The CLI commands for connectors, images, and secrets now accept a single positional connector argument, replacing separate --connector-name and --connector-directory options. The Docker image build process is updated to support dynamic retrieval of Dockerfile templates from either the local repo or GitHub. Several legacy utility and template files are deleted, and all affected imports are updated. New unit tests are added for repository root resolution.

Changes

Files / Groups Change Summary
airbyte_cdk/cli/airbyte_cdk/_connector.py, _image.py, _secrets.py Refactored CLI commands to accept a single positional connector argument instead of separate name and directory options. Updated imports to use the new connector_paths.py utilities. Adjusted function signatures and docstrings accordingly. _image.py adds a --dockerfile option.
airbyte_cdk/cli/airbyte_cdk/_util.py, airbyte_cdk/test/standard_tests/test_resources.py, airbyte_cdk/utils/docker_image_templates.py Deleted legacy utility and template files. Removed functions/constants for connector path resolution and Dockerfile templates.
airbyte_cdk/utils/connector_paths.py New module providing robust utilities for resolving connector names, directories, and repo roots. Implements functions for path resolution and connector root finding.
airbyte_cdk/utils/docker.py Refactored Docker image build logic to dynamically retrieve Dockerfile templates locally or from GitHub. Added support for Dockerfile override. Updated build arguments and error handling.
airbyte_cdk/test/standard_tests/connector_base.py, declarative_sources.py, util.py, unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py Updated imports to use new connector_paths.py for connector root and manifest constants. No logic changes.
unit_tests/cli/airbyte_cdk/test_util.py New unit tests for resolve_airbyte_repo_root, verifying repo root resolution from various paths.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI_Command
    participant ConnectorPathsUtil
    participant DockerUtil

    User->>CLI_Command: Invoke CLI with [connector] argument
    CLI_Command->>ConnectorPathsUtil: resolve_connector_name_and_directory(connector)
    ConnectorPathsUtil-->>CLI_Command: (connector_name, connector_directory)
    CLI_Command->>DockerUtil: build_connector_image(connector_directory, dockerfile_override)
    DockerUtil->>ConnectorPathsUtil: resolve_airbyte_repo_root(connector_directory)
    DockerUtil->>DockerUtil: get_dockerfile_templates() or download from GitHub
    DockerUtil-->>CLI_Command: Build image
    CLI_Command-->>User: Output result
Loading

Possibly related PRs

  • airbytehq/airbyte-python-cdk#504: Introduced the initial image build CLI command with separate connector arguments; this PR directly refactors that interface and logic.
  • airbytehq/airbyte-python-cdk#493: Added the original CLI and utility functions for connector path resolution, which are now refactored and relocated in this PR.

Suggested reviewers

  • bnchrch
  • maxi297
  • aldogonzalez8

Would you like to add more test cases for the new path resolution logic, or is the current coverage sufficient for now—wdyt?

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@aaronsteers aaronsteers requested a review from dbgold17 May 2, 2025 18:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
airbyte_cdk/utils/docker.py (2)

267-309: Consider adding timeout and retry logic to GitHub requests, wdyt?

The HTTP requests to GitHub don't include timeout or retry logic, which could potentially make the build process less reliable in case of network issues.

- dockerfile_resp = requests.get(dockerfile_url)
- dockerfile_resp.raise_for_status()
+ # Add timeout and retry for better reliability
+ session = requests.Session()
+ adapter = requests.adapters.HTTPAdapter(max_retries=3)
+ session.mount('https://', adapter)
+ dockerfile_resp = session.get(dockerfile_url, timeout=10)
+ dockerfile_resp.raise_for_status()

Also consider adding a user agent header to follow good API etiquette:

- dockerfile_resp = requests.get(dockerfile_url)
+ headers = {"User-Agent": "Airbyte-CDK/1.0"}
+ dockerfile_resp = requests.get(dockerfile_url, headers=headers)

296-296: Consider parameterizing the GitHub branch reference

The hardcoded reference to the master branch might cause issues if the repository structure changes or if specific versions need to be targeted.

- base_url = f"https://github.com/airbytehq/airbyte/raw/master/docker-images/"
+ # Allow specifying a different branch or tag
+ github_branch = os.environ.get("AIRBYTE_GITHUB_BRANCH", "master")
+ base_url = f"https://github.com/airbytehq/airbyte/raw/{github_branch}/docker-images/"
airbyte_cdk/cli/airbyte_cdk/_connector.py (1)

49-53: Good consolidation of imports.

The import statements have been nicely reorganized to use the new centralized location for connector path utilities. I noticed you commented out pytest_hooks - was that intentional? If it's no longer needed, maybe we should remove it completely? wdyt?

unit_tests/cli/airbyte_cdk/test_util.py (2)

1-3: Unused import in test file.

The check import from tabnanny on line 2 doesn't appear to be used anywhere in this file. Should we remove it?

from pathlib import Path
-from tabnanny import check

import pytest

61-84: Consider expanding exception handling.

The test only specifically handles FileNotFoundError, but resolve_airbyte_repo_root might raise other exceptions. Would it make sense to add a more general exception handler to provide clearer test failures for unexpected errors? Maybe something like:

    try:
        repo_root = resolve_airbyte_repo_root(start_dir_rel)
        # existing validation code
    except FileNotFoundError:
        if expect_success:
            raise AssertionError(f"Airbyte repo root not found from '{start_dir_rel!s}'.")
+    except Exception as e:
+        if expect_success:
+            raise AssertionError(f"Unexpected error resolving repo root from '{start_dir_rel!s}': {e}")
+        # If we expected failure, any exception is fine

What do you think?

airbyte_cdk/cli/airbyte_cdk/_image.py (1)

81-81: Minor redundancy in parameter passing.

There's a small redundancy in the parameter passing:

dockerfile_override=dockerfile or None,

Since dockerfile is already defined as Path | None, the or None is unnecessary. Consider simplifying to:

-            dockerfile_override=dockerfile or None,
+            dockerfile_override=dockerfile,

What do you think?

airbyte_cdk/cli/airbyte_cdk/_secrets.py (4)

76-81: Use a richer Click type for paths?

The positional connector argument is declared as type=str, which works for both a plain connector name and a path expressed as a string.
Would it be useful to accept a real path object as well, by switching to type=click.Path(path_type=Path, exists=False) and then passing it through unchanged?
That would give users tab-completion for directories and automatically normalise Path handling inside resolve_connector_name_and_directory, while still allowing simple names. Wdyt?


115-117: Minor security wording tweak?

The warning now says the option is ignored without CI, but the implementation below prints an explanatory message after the flag is supplied.
Perhaps “will be ignored and a notice printed” would match the runtime behaviour more precisely. Wdyt?


121-123: Surface resolution errors more prominently

resolve_connector_name_and_directory() can raise ValueError / FileNotFoundError.
Would wrapping the call in a tiny try/except that converts these into click.ClickException make the CLI error output friendlier (no stack trace by default)? For example:

-    connector_name, connector_directory = resolve_connector_name_and_directory(connector)
+    try:
+        connector_name, connector_directory = resolve_connector_name_and_directory(connector)
+    except (ValueError, FileNotFoundError) as exc:
+        raise click.ClickException(str(exc)) from exc

This keeps internal exceptions out of user-facing terminals. Wdyt?


287-289: Rename shadow variable to underscore

Using _ = connector_name avoids the “unused” linter complaint, but it still reads as a placeholder.
Would dropping the parameter entirely (or renaming the function to _get_secrets_dir_for_connector() with no connector_name arg) simplify the signature?

airbyte_cdk/utils/connector_paths.py (1)

168-170: Consider accepting “speculative” prefixes

The name check currently hard-fails when the directory name does not start with source- or destination-.
Would accepting custom prefixes (e.g., processor-, future categories) or at least providing a clearer hint (“expected 'source-' or 'destination-' prefix”) improve UX? Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f11bfa and 1695e8b.

📒 Files selected for processing (13)
  • airbyte_cdk/cli/airbyte_cdk/_connector.py (4 hunks)
  • airbyte_cdk/cli/airbyte_cdk/_image.py (3 hunks)
  • airbyte_cdk/cli/airbyte_cdk/_secrets.py (5 hunks)
  • airbyte_cdk/cli/airbyte_cdk/_util.py (0 hunks)
  • airbyte_cdk/test/standard_tests/connector_base.py (1 hunks)
  • airbyte_cdk/test/standard_tests/declarative_sources.py (1 hunks)
  • airbyte_cdk/test/standard_tests/test_resources.py (0 hunks)
  • airbyte_cdk/test/standard_tests/util.py (1 hunks)
  • airbyte_cdk/utils/connector_paths.py (1 hunks)
  • airbyte_cdk/utils/docker.py (5 hunks)
  • airbyte_cdk/utils/docker_image_templates.py (0 hunks)
  • unit_tests/cli/airbyte_cdk/test_util.py (1 hunks)
  • unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (1 hunks)
💤 Files with no reviewable changes (3)
  • airbyte_cdk/cli/airbyte_cdk/_util.py
  • airbyte_cdk/utils/docker_image_templates.py
  • airbyte_cdk/test/standard_tests/test_resources.py
🧰 Additional context used
🧠 Learnings (1)
airbyte_cdk/test/standard_tests/declarative_sources.py (1)
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
🧬 Code Graph Analysis (2)
airbyte_cdk/cli/airbyte_cdk/_image.py (2)
airbyte_cdk/utils/connector_paths.py (1)
  • resolve_connector_name_and_directory (81-145)
airbyte_cdk/utils/docker.py (1)
  • verify_docker_installation (394-400)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
airbyte_cdk/utils/connector_paths.py (1)
  • resolve_connector_name_and_directory (81-145)
🪛 GitHub Actions: Linters
airbyte_cdk/utils/connector_paths.py

[error] 142-142: mypy error: Name "connector" is not defined [name-defined]

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
🔇 Additional comments (21)
airbyte_cdk/utils/docker.py (5)

15-15: New dependency added

Added requests library for downloading Dockerfile templates from GitHub. This allows fallback template fetching when local templates aren't available.


144-144: Added flexibility with dockerfile_override parameter, nice!

This allows users to specify a custom Dockerfile path, which is a good addition for flexibility. This matches the PR objective of making the CLI more intuitive and versatile.


168-196: Enhanced Dockerfile resolution with multiple fallback strategies

The implementation handles three scenarios gracefully:

  1. Custom Dockerfile specified via the new parameter
  2. Local templates from the Airbyte repository
  3. Downloaded templates from GitHub as a fallback

This robust approach ensures the build process works in various environments, including when the Airbyte repo isn't checked out locally.


213-213: Simplified build args

The code now only uses the kebab-case connector name instead of both formats, simplifying the build arguments.


312-360: Good implementation of template resolution

This function correctly uses the new resolve_airbyte_repo_root function to find Docker templates from the local repository. The error handling is thorough and the validation of file existence is comprehensive.

airbyte_cdk/test/standard_tests/declarative_sources.py (1)

15-15: Updated import to use centralized connector paths module

The import was updated to use the new airbyte_cdk.utils.connector_paths module, which aligns with the broader refactoring in this PR. This is consistent with the objective of centralizing repository path resolution.

airbyte_cdk/test/standard_tests/connector_base.py (1)

27-30: Updated imports to use centralized connector paths module

This change follows the same pattern of centralizing connector path and repository resolution logic into the new airbyte_cdk.utils.connector_paths module. The consistent import changes across multiple files show good attention to detail.

airbyte_cdk/test/standard_tests/util.py (1)

15-18: Updated imports to use centralized connector paths module

The change maintains consistency with other files by importing from the new centralized module. This change contributes to the PR objective of making repository root finding more robust.

unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (1)

35-35: Import path change looks good.

The import of MANIFEST_YAML has been updated to use the new centralized module airbyte_cdk.utils.connector_paths instead of the previous location. This aligns well with the PR objective of consolidating connector path resolution logic.

airbyte_cdk/cli/airbyte_cdk/_connector.py (4)

104-109: CLI interface simplification looks great!

Changing from two separate options to a single positional argument makes the CLI more intuitive and aligns perfectly with the PR objectives.


117-117: Function signature updated correctly.

The parameter now accepts either a string or Path object, matching the CLI interface changes.


124-127: Clear and helpful documentation.

The updated docstring clearly explains the new behavior, making it easy for users to understand how the command works.


137-137: Implementation update matches new approach.

The call to resolve_connector_name_and_directory has been correctly updated to use the simplified parameter.

unit_tests/cli/airbyte_cdk/test_util.py (2)

11-25: Helper function looks good.

The check_local_repo_structure helper function effectively validates the existence of the expected repository layout. This will help ensure the tests only run in the appropriate environment.


27-47: Well-structured test parametrization.

The test cases cover a good variety of scenarios for repository root resolution, including both expected successes and failures. The skip condition is also well implemented to avoid running tests in environments without the required directory structure.

airbyte_cdk/cli/airbyte_cdk/_image.py (5)

14-14: Import path update looks good.

The import has been correctly updated to use the new centralized location for connector path utilities.


31-36: CLI interface simplification is consistent.

Using a single positional argument instead of two separate options is consistent with the changes in _connector.py and makes the CLI more intuitive.


39-43: New dockerfile override option is useful.

The addition of the --dockerfile option provides flexibility for users who need to customize the build process, which is a nice enhancement.


53-54: Clear documentation of the new parameter behavior.

The docstring clearly explains how the CONNECTOR argument works, making it easy for users to understand.


62-62: Implementation updated correctly.

The call to resolve_connector_name_and_directory has been correctly updated to use the simplified parameter.

airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)

105-107: Docstring update keeps CLI behaviour crystal-clear

👍 Thanks for expanding the docstring to reflect the new positional argument – this will save users a lot of confusion.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
airbyte_cdk/utils/connector_paths.py (2)

44-45: Should we use _is_airbyte_repo_root for parent dir validation as well?

Unlike the sibling and child directory checks that use _is_airbyte_repo_root, the parent directory check only verifies the directory name. This could lead to false positives if a directory happens to be named "airbyte" but doesn't have the expected structure.

Wdyt about using the same validation here?

if parent_dir.name == "airbyte" or parent_dir.name == "airbyte-enterprise":
    return parent_dir
if parent_dir.name == "airbyte" or parent_dir.name == "airbyte-enterprise":
+   if _is_airbyte_repo_root(parent_dir):
        return parent_dir

142-142: Undefined variable in error message - should be connector_ref

The variable connector is never defined in this scope. I believe you meant to use connector_ref here.

-    f"Could not infer connector_name or connector_directory from input ref: {connector}",
+    f"Could not infer connector_name or connector_directory from input ref: {connector_ref}",
🧹 Nitpick comments (2)
airbyte_cdk/utils/connector_paths.py (2)

136-140: Potential redundancy in connector directory resolution logic

The code flow has already established connector_directory earlier, but then conditionally sets it again here. We might be able to simplify this logic.

Wdyt about this approach?

-    if connector_directory:
-        connector_directory = connector_directory.resolve().absolute()
-    elif connector_name:
-        connector_directory = find_connector_root_from_name(connector_name)
-    else:
+    if not connector_directory and connector_name:
+        connector_directory = find_connector_root_from_name(connector_name)
+    elif not connector_directory:
         raise ValueError(
             f"Could not infer connector_name or connector_directory from input ref: {connector_ref}",
         )
+    connector_directory = connector_directory.resolve().absolute()

105-106: Path detection could be more robust for edge cases

Detecting paths by the presence of "/" or "\" works for most cases, but might not handle all valid paths correctly. For example, network paths or URIs.

Wdyt about using Path.is_absolute() or checking if the string can be converted to a valid path that exists?

-            if "/" in connector_ref or "\\" in connector_ref:
+            # Try to convert to Path and check if it exists
+            potential_path = Path(connector_ref)
+            if potential_path.exists() or "/" in connector_ref or "\\" in connector_ref:
                 # If the connector name is a path, treat it as a directory
                 connector_directory = Path(connector_ref)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1695e8b and c79c719.

📒 Files selected for processing (1)
  • airbyte_cdk/utils/connector_paths.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (4)
airbyte_cdk/utils/connector_paths.py (4)

1-10: Good job on setting up the module structure and constants!

The constants defined for marker files provide a clear reference point for identifying connector directories. The imports are minimal and appropriate for the functionality.


80-146: Well-structured resolution logic for connector references

The function provides a flexible interface for resolving connector names and directories from various input types. The design allows users to specify either a name or path, with intelligent fallbacks.


174-193: Well-implemented connector root finder

The function efficiently searches for marker files across multiple paths and their parent directories, with a clean stopping condition when reaching "airbyte_cdk". This will make connector discovery reliable.


195-221: Good fallback logic for finding connectors

I like how this function first tries the direct approach, then falls back to locating via the Airbyte repo root with clear error messages at each step.

@aaronsteers aaronsteers changed the title feat(cli): use central dockerfile definitions, add capability to find airbyte-repo-root feat(cli): use central dockerfile definitions, add capability to find airbyte-repo-root, use multi-purpose CONNECTOR positional arg in CLI commands May 2, 2025
@aaronsteers aaronsteers changed the title feat(cli): use central dockerfile definitions, add capability to find airbyte-repo-root, use multi-purpose CONNECTOR positional arg in CLI commands feat(cli): use central dockerfile definitions, add capability to locate airbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commands May 2, 2025
@aaronsteers aaronsteers changed the title feat(cli): use central dockerfile definitions, add capability to locate airbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commands feat(cli): use new central dockerfile locations, add capability to locate airbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commands May 2, 2025
@aaronsteers aaronsteers changed the title feat(cli): use new central dockerfile locations, add capability to locate airbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commands feat(cli): use new central Dockerfile locations, add capability to locate airbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commands May 2, 2025
Copy link
Contributor

@dbgold17 dbgold17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, with a discussion question that we can always figure out later.

Did you consider having the Dockerfiles live within the CDK and imported into the airbyte repo as needed instead of now having all of this logic to find the airbyte repo locally? The file path traversing and directory structure logic feels a bit magic / fragile, but is is also nice if it just works. Another alternative that I don't love is that we could have the cdk cli by default pull from Github unless the user specifies a path to the airbyte repo locally.

Copy link
Contributor Author

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbgold17 - Thanks again for your review here. Responded inline to your questions and we can keep these discussions going after the merge.

Regarding your question about where to house the Dockerfiles, I feel more strongly that putting them in the main repo is at least a good path, and as discussed the past couple days, it allows all teams to have a sense of ownership in those build processes.

I think it's okay for the CDK CLI to get files from that repo - and I don't mind much if we get from local refs vs defaulting to downloading from github. The only reason I didn't default to downloading from github in this iteration is that I wanted our default mode to not need network access. So, preferring local files, then falling back to a remote download seemed like the best first implementation for now, at least.

Thanks again for all your feedback and the very thoughtful comments!

@aaronsteers aaronsteers merged commit d09c2ec into main May 7, 2025
29 of 31 checks passed
@aaronsteers aaronsteers deleted the aj/feat/cli/use-central-dockerfile-images branch May 7, 2025 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants