Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve issue where config from API is cast to incorrect model type (faulty discriminator logic) #114

Open
aaronsteers opened this issue Jul 30, 2024 · 0 comments

Comments

@aaronsteers
Copy link
Contributor

aaronsteers commented Jul 30, 2024

With our library of 300+ sources and 60+ destinations, certain API endpoints should return a "configuration" that is typed to the correct source or destination class, but they don't properly deserialize into the proper classes. Instead, they attempt to deserialize into the first match alphabetically (e.g. "Airtable" instead of "Snowflake" or "MySQL").

I've worked around this by hacking a bit and getting the original raw dict object, but this has been a stumbling block for specific use cases.

Workaround logic is here:

https://github.com/airbytehq/PyAirbyte/blob/f7b88eba400d7aa768c8c370cfbac6f18dfc61c6/airbyte/_util/api_util.py#L577-L597


Speakeasy has some docs on how to set up discriminator logic here:

Example code from the docs:

components:
  responses:
    OrderResponse:
      oneOf:
        - $ref: "#/components/schemas/DrinkOrder"
        - $ref: "#/components/schemas/IngredientOrder"
      discriminator:
        propertyName: orderType

In our case, the descriminating property does exist in the data as sourceType and destinationType, but it is not defined with the above syntax.

Here is an example declaration which shows sourceType should be ready to leverage if we reference it in the descriminator declaration:

    source-aha:
      type: "object"
      required:
      - "api_key"
      - "url"
      - "sourceType"
      properties:
        api_key:
          type: "string"
          title: "API Bearer Token"
          airbyte_secret: true
          description: "API Key"
          order: 0
          x-speakeasy-param-sensitive: true
        url:
          type: "string"
          description: "URL"
          title: "Aha Url Instance"
          order: 1
        sourceType:
          title: "aha"
          const: "aha"
          enum:
          - "aha"
          order: 0
          type: "string"

Current it does not appear that we define any descriminator logic to DestinationConfiguration or SourceConfiguration.

Below is destination configuration. Note there is oneOf logic but no discriminator logic defined. Same for SourceConfiguration, although I'm not showing it because it is much larger.

Show/Hide

https://raw.githubusercontent.com/airbytehq/airbyte-platform/refs/heads/main/airbyte-api/server-api/src/main/openapi/api_sdk.yaml

    DestinationConfiguration:
      description: The values required to configure the destination.
      example: { user: "charles" }
      oneOf:
        - title: destination-google-sheets
          $ref: "#/components/schemas/destination-google-sheets"
        - title: destination-astra
          $ref: "#/components/schemas/destination-astra"
        - title: destination-aws-datalake
          $ref: "#/components/schemas/destination-aws-datalake"
        - title: destination-azure-blob-storage
          $ref: "#/components/schemas/destination-azure-blob-storage"
        - title: destination-bigquery
          $ref: "#/components/schemas/destination-bigquery"
        - title: destination-clickhouse
          $ref: "#/components/schemas/destination-clickhouse"
        - title: destination-convex
          $ref: "#/components/schemas/destination-convex"
        - title: destination-databricks
          $ref: "#/components/schemas/destination-databricks"
        - title: destination-dev-null
          $ref: "#/components/schemas/destination-dev-null"
        - title: destination-duckdb
          $ref: "#/components/schemas/destination-duckdb"
        - title: destination-dynamodb
          $ref: "#/components/schemas/destination-dynamodb"
        - title: destination-elasticsearch
          $ref: "#/components/schemas/destination-elasticsearch"
        - title: destination-firebolt
          $ref: "#/components/schemas/destination-firebolt"
        - title: destination-firestore
          $ref: "#/components/schemas/destination-firestore"
        - title: destination-gcs
          $ref: "#/components/schemas/destination-gcs"
        - title: destination-iceberg
          $ref: "#/components/schemas/destination-iceberg"
        - title: destination-milvus
          $ref: "#/components/schemas/destination-milvus"
        - title: destination-mongodb
          $ref: "#/components/schemas/destination-mongodb"
        - title: destination-motherduck
          $ref: "#/components/schemas/destination-motherduck"
        - title: destination-mssql
          $ref: "#/components/schemas/destination-mssql"
        - title: destination-mysql
          $ref: "#/components/schemas/destination-mysql"
        - title: destination-oracle
          $ref: "#/components/schemas/destination-oracle"
        - title: destination-pgvector
          $ref: "#/components/schemas/destination-pgvector"
        - title: destination-pinecone
          $ref: "#/components/schemas/destination-pinecone"
        - title: destination-postgres
          $ref: "#/components/schemas/destination-postgres"
        - title: destination-pubsub
          $ref: "#/components/schemas/destination-pubsub"
        - title: destination-qdrant
          $ref: "#/components/schemas/destination-qdrant"
        - title: destination-redis
          $ref: "#/components/schemas/destination-redis"
        - title: destination-redshift
          $ref: "#/components/schemas/destination-redshift"
        - title: destination-s3
          $ref: "#/components/schemas/destination-s3"
        - title: destination-s3-glue
          $ref: "#/components/schemas/destination-s3-glue"
        - title: destination-sftp-json
          $ref: "#/components/schemas/destination-sftp-json"
        - title: destination-snowflake
          $ref: "#/components/schemas/destination-snowflake"
        - title: destination-snowflake-cortex
          $ref: "#/components/schemas/destination-snowflake-cortex"
        - title: destination-teradata
          $ref: "#/components/schemas/destination-teradata"
        - title: destination-timeplus
          $ref: "#/components/schemas/destination-timeplus"
        - title: destination-typesense
          $ref: "#/components/schemas/destination-typesense"
        - title: destination-vectara
          $ref: "#/components/schemas/destination-vectara"
        - title: destination-weaviate
          $ref: "#/components/schemas/destination-weaviate"
        - title: destination-yellowbrick
          $ref: "#/components/schemas/destination-yellowbrick"

Proposed fix

To resolve, we should add this text to the DestinationConfiguration declaration in the OpenAPI spec:

    DestinationConfiguration:
      # ...
      discriminator:
        propertyName: destinationType
      oneOf:
        - title: destination-google-sheets
          $ref: "#/components/schemas/destination-google-sheets"
      # ...

and similarly for sources:

    SourceConfiguration:
      # ...
      discriminator:
        propertyName: sourceType
      oneOf:
        - title: ...
          $ref: ...
      # ...
@aaronsteers aaronsteers transferred this issue from airbytehq/PyAirbyte Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant