Skip to content

[source-google-analytics-data-api] Add option to merge multi-property data into single streams instead of creating separate tables per property #73667

@devin-ai-integration

Description

@devin-ai-integration

Problem

When the GA4 source connector (source-google-analytics-data-api) is configured with multiple property IDs, it creates separate streams (and therefore separate destination tables) for each property beyond the first:

  • First property → <report_name> (e.g., daily_active_users)
  • Additional properties → <report_name>Property<property_id> (e.g., daily_active_usersPropertyXXXXXXXXX)

This is implemented in the manifest at the stream naming logic:

# NOTE: Yes, this is weird, but it exists solely for backward-compatibility.
# When the config contains multiple property IDs, it keeps the
# first stream as <report_name>, and names each additional stream
# <report_name>Property<property_id>.
- type: ComponentMappingDefinition
  field_path: ["**", "name"]
  value: "{{ components_values['name'] + 'Property' + components_values['source_config_1'][1] if components_values['source_config_1'][0] > 0 else components_values['name'] }}"

(manifest.yaml, line 925-931)

Core issue

While each record already contains a property_id column and property_id is part of the primary key, the stream-level splitting means data from different properties lands in different destination tables. This introduces a dynamic schema problem:

  • Adding or removing property IDs changes the set of streams/tables in the catalog
  • Downstream queries, dashboards, and BI tools break because they expect a fixed set of tables
  • Users get N× the expected number of tables, all with identical schemas but different names
  • The property_id column already exists and could serve to distinguish data within a single table, making the split unnecessary

Current workarounds

Users must either:

  1. Create separate Airbyte sources per property ID
  2. Manually create SQL views or dbt models to UNION ALL the per-property tables back together
  3. Use only a single property ID per connection

All of these add operational complexity.

Proposed Solution

Add a configuration option (e.g., merge_property_streams: true/false) that, when enabled:

  1. Emits a single stream per report regardless of how many property IDs are configured (e.g., just daily_active_users, not daily_active_usersPropertyXXX)
  2. Includes property_id as a column in every record (already the case today)
  3. Keeps property_id as part of the primary key (already the case today)

The default should preserve the current behavior (false) for backward compatibility. The code change would be localized to the stream naming ComponentMappingDefinition in the manifest.

Impact

This would allow users with multiple GA4 properties to get a single, stable table per report in their destination — which is the expected behavior for most analytics workflows.


Devin session


Internal Tracking: https://github.com/airbytehq/oncall/issues/11369

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions