Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Build Mini-CAT in Python CDK #287

Open
3 tasks done
devin-ai-integration bot opened this issue Jan 28, 2025 · 0 comments
Open
3 tasks done

Proposal: Build Mini-CAT in Python CDK #287

devin-ai-integration bot opened this issue Jan 28, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@devin-ai-integration
Copy link
Contributor

devin-ai-integration bot commented Jan 28, 2025

Overview

Following a conversation with @aaronsteers, this issue proposes incrementally migrating CAT tests from the monorepo to the Python CDK. This will enable running tests directly against Python classes/functions without requiring Docker containers, for better debugging. This will also enable running via docker or via CLI, for non-Python sources, and will support Yaml-based sources, including those with custom components.py files.

Tests should be runnable via CLI and/or Docker (for parity with CAT today), and should also be runnable as a pytest suite, for more control and faster debug iteration loops.

Current Test Categories

Connector Acceptance Test (CAT) Checklist

This file tracks all the tests implemented in the CAT framework.

Test Files to Examine:

  • test_incremental.py
  • test_full_refresh.py
  • test_core.py

Test Categories:

1. Specification Tests (test_core.py TestSpec)

  1. Configuration Schema Validation

    • Validates connector configuration against JSON schema
    • Checks enum usage and uniqueness
    • Validates oneOf usage in specs
    • Tests required vs optional fields
    • Validates property types and formats
    • Checks date patterns and formats
  2. Secret Handling

    • Verifies proper marking of secret fields
    • Ensures secrets never appear in outputs
    • Validates OAuth flow parameters
    • Tests OAuth as default auth method
  3. Schema Validation

    • Checks property types (no arrays at root)
    • Validates object structures
    • Ensures backward compatibility
    • Verifies additional properties handling

2. Connection Tests (test_core.py TestConnection)

  1. Basic Connection Check
    • Tests successful connection scenarios
    • Validates error handling
    • Verifies connection status messages

3. Discovery Tests (test_core.py TestDiscovery)

  1. Catalog Structure

    • Verifies stream discovery
    • Validates JSON schemas
    • Ensures unique stream names
    • Checks cursor field definitions
    • Validates primary key existence
  2. Schema Compatibility

    • Tests backward compatibility
    • Validates supported data types
    • Checks sync mode support
    • Verifies primary key data types

4. Basic Read Tests (test_core.py TestBasicRead)

  1. Record Validation

    • Checks record structure against schema
    • Validates data types and formats
    • Verifies required fields presence
    • Tests empty streams handling
  2. Stream Status

    • Validates stream status messages
    • Checks status progression (STARTED → RUNNING → COMPLETE)
    • Verifies state message format
  3. Error Handling

    • Tests failure scenarios
    • Validates error trace messages
    • Checks connector behavior with invalid configs

5. Full Refresh Tests (test_full_refresh.py)

  1. Sequential Read Validation
    • Verifies identical data between syncs
    • Validates record order consistency
    • Checks emitted_at timestamp progression

6. Incremental Sync Tests (test_incremental.py)

  1. State Management

    • Tests state message emission
    • Validates cursor field handling
    • Verifies state checkpoints
    • Tests abnormal state values
  2. Record Processing

    • Checks record progression
    • Validates incremental filtering
    • Tests slice management
    • Verifies data consistency

7. Connector Attributes Tests (test_core.py TestConnectorAttributes)

  1. Metadata Validation
    • Checks primary key definitions
    • Validates allowed hosts configuration
    • Verifies suggested streams setup

8. Documentation Tests (test_core.py TestConnectorDocumentation)

  1. Structure Validation
    • Checks required sections presence
    • Validates documentation format
    • Verifies content templates
    • Tests link validity

Key Implementation Considerations for CDK:

  1. Modular Test Framework

    • Each test category should be independent
    • Support for selective test execution
    • Configurable test parameters
  2. Environment Management

    • Abstract container dependencies
    • Support for local and containerized testing
    • Flexible resource cleanup
  3. State Handling

    • Generic state management interface
    • Support for different state formats
    • Robust checkpoint management
  4. Schema Validation

    • Reusable schema validators
    • Type checking utilities
    • Format validation helpers

Migration Strategy

Phase 1: Core Validation Layer

  • Implement schema validators
  • Add record structure validation
  • Create type checking utilities
  • Port documentation tests

Phase 2: Test Infrastructure

  • Create modular test runners
  • Add configurable test parameters
  • Implement environment abstraction layer
  • Port specification tests
  • Port basic read tests

Phase 3: State Management

  • Design generic state interfaces
  • Implement checkpoint handling
  • Add state format validation
  • Port incremental sync tests
  • Port full refresh tests

Phase 4: Container Abstraction

  • Abstract Docker dependencies
  • Create flexible test runners
  • Support both local and containerized testing
  • Port connection tests

Benefits

  1. Faster test cycles for Python connectors
  2. Simplified local development
  3. Better integration with IDE tooling
  4. Reduced infrastructure requirements

Implementation Notes

  • No need for backward compatibility
  • Migrated tests should work with:
    • Declarative Yaml Sources (with and without custom Python components)
    • Python-Based Sources and Destinations
    • Docker-Based Sources and Destinations (Fallback for everything: Java/Kotlin/Python/etc.)
  • Support customization via Yaml or Python (via existing manifest or new test manifest)
  • Support incremental expansion of test converage, until eventually CAT can be fully deprecated.
  • Provide clear inline docs (esp. class, file, and method-level docstrings)

/cc @aaronsteers

Related spike from a couple quarters ago:

@devin-ai-integration devin-ai-integration bot added the enhancement New feature or request label Jan 28, 2025
@aaronsteers aaronsteers changed the title feat: Migrate Connector Acceptance Tests to Python CDK Proposal: Build Mini-CAT in Python CDK Jan 28, 2025
@aaronsteers aaronsteers self-assigned this Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant