-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Problem
When the GA4 source connector (source-google-analytics-data-api) is configured with multiple property IDs, it creates separate streams (and therefore separate destination tables) for each property beyond the first:
- First property →
<report_name>(e.g.,daily_active_users) - Additional properties →
<report_name>Property<property_id>(e.g.,daily_active_usersPropertyXXXXXXXXX)
This is implemented in the manifest at the stream naming logic:
# NOTE: Yes, this is weird, but it exists solely for backward-compatibility.
# When the config contains multiple property IDs, it keeps the
# first stream as <report_name>, and names each additional stream
# <report_name>Property<property_id>.
- type: ComponentMappingDefinition
field_path: ["**", "name"]
value: "{{ components_values['name'] + 'Property' + components_values['source_config_1'][1] if components_values['source_config_1'][0] > 0 else components_values['name'] }}"Core issue
While each record already contains a property_id column and property_id is part of the primary key, the stream-level splitting means data from different properties lands in different destination tables. This introduces a dynamic schema problem:
- Adding or removing property IDs changes the set of streams/tables in the catalog
- Downstream queries, dashboards, and BI tools break because they expect a fixed set of tables
- Users get N× the expected number of tables, all with identical schemas but different names
- The
property_idcolumn already exists and could serve to distinguish data within a single table, making the split unnecessary
Current workarounds
Users must either:
- Create separate Airbyte sources per property ID
- Manually create SQL views or dbt models to
UNION ALLthe per-property tables back together - Use only a single property ID per connection
All of these add operational complexity.
Proposed Solution
Add a configuration option (e.g., merge_property_streams: true/false) that, when enabled:
- Emits a single stream per report regardless of how many property IDs are configured (e.g., just
daily_active_users, notdaily_active_usersPropertyXXX) - Includes
property_idas a column in every record (already the case today) - Keeps
property_idas part of the primary key (already the case today)
The default should preserve the current behavior (false) for backward compatibility. The code change would be localized to the stream naming ComponentMappingDefinition in the manifest.
Impact
This would allow users with multiple GA4 properties to get a single, stable table per report in their destination — which is the expected behavior for most analytics workflows.
Internal Tracking: https://github.com/airbytehq/oncall/issues/11369