Skip to content

BigQuery metadata ingestion failure for ADC with multiple project_ids #12327

Open
@jackson-burke

Description

@jackson-burke

Describe the bug
I'm unable to ingest metadata from BigQuery using my Application Default Credentials, possibly due to the fact that the ADC are associated with multiple BigQuery projects.

With the following recipe:

source:
  type: "bigquery"
  config:
    column_limit: 10000
    extract_column_lineage: true
sink:
  type: "datahub-rest"
  config:
    server: ${DATAHUB_GMS_URL}
    token: ${DATAHUB_GMS_TOKEN}

I receive this error: Failed to configure the source (bigquery): Project was not passed and could not be determined from the environment.

I previously could ingest successfully by manually setting my credentials like the below, but would prefer to use the ADC approach.

source:
  type: "bigquery"
  config:
    credential:
      type: "service_account"
      project_id: ${DATAHUB_BIGQUERY_SA_PROJECT_ID}
      private_key_id: ${DATAHUB_BIGQUERY_SA_PRIVATE_KEY_ID}
      private_key: ${DATAHUB_BIGQUERY_SA_PRIVATE_KEY}
      client_email: ${DATAHUB_BIGQUERY_SA_CLIENT_EMAIL}
      client_id: ${DATAHUB_BIGQUERY_SA_CLIENT_ID}
      auth_uri: ${DATAHUB_BIGQUERY_SA_AUTH_URI}
      token_uri: ${DATAHUB_BIGQUERY_SA_TOKEN_URI}
      auth_provider_x509_cert_url: ${DATAHUB_BIGQUERY_SA_AUTH_PROVIDER_X509_CERT_URL}
      client_x509_cert_url: ${DATAHUB_BIGQUERY_SA_CLIENT_X509_CERT_URL}
    column_limit: 10000
    extract_column_lineage: true
sink:
  type: "datahub-rest"
  config:
    server: ${DATAHUB_GMS_URL}
    token: ${DATAHUB_GMS_TOKEN}

To Reproduce
Steps to reproduce the behavior:

  1. Configure a similar recipe for a GCP profile with multiple projects associated to it.
  2. Attempt to ingest metadata from them.

Expected behavior
I would expect to ingest metadata from all associated project_ids or filter only the relevant ones when specifying in the project_ids config.

Desktop (please complete the following information):

  • OS: iOS
  • Browser chrome
  • Version 14.1

Additional context

  • ADC path is properly set (echo $GOOGLE_APPLICATION_CREDENTIALS outputs the correct path) and when I run gcloud projects list I observe my relevant projects.
  • I tried adding project_ids = ["target_id"] in the config, but I receive the same error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBug report

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions