Skip to content

Conversation

LiamMcFall
Copy link
Contributor

Description

This PR creates an aggregates table for the newtab-content ping. It then also combines that ping data with the existing newtab ping.

Related Tickets & Documents

Reviewer, please follow this checklist

@LiamMcFall LiamMcFall marked this pull request as draft July 23, 2025 23:09
@LiamMcFall LiamMcFall self-assigned this Jul 23, 2025
@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@LiamMcFall LiamMcFall requested a review from gkatre September 3, 2025 19:36
@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@LiamMcFall
Copy link
Contributor Author

LiamMcFall commented Sep 5, 2025

Here is the Looker PR ready to go for when we are ready to deploy this. PR

The only schema changes that will happen is the removal of the Pocket Save metrics and the renaming of scheduled_surface_id to newtab_content_surface_id. The rest of the fields remain the same, with some new ones added.

I can make the minor updates needed to the Firefox New Tab: Feed Engagement report to account for these changes and will communicate them in the #hnt-data slack channel.

@dataops-ci-bot
Copy link

Integration report for "Merge branch 'main' into newtab-content_dev"

sql.diff

Click to expand!
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/dags/bqetl_newtab.py /tmp/workspace/generated-sql/dags/bqetl_newtab.py
--- /tmp/workspace/main-generated-sql/dags/bqetl_newtab.py	2025-09-09 01:31:17.000000000 +0000
+++ /tmp/workspace/generated-sql/dags/bqetl_newtab.py	2025-09-09 01:33:00.000000000 +0000
@@ -183,6 +183,21 @@
             firefox_desktop_derived__newtab_component_content__v1
         )
 
+    firefox_desktop_derived__newtab_content_items_daily__v1 = bigquery_etl_query(
+        task_id="firefox_desktop_derived__newtab_content_items_daily__v1",
+        destination_table="newtab_content_items_daily_v1",
+        dataset_id="firefox_desktop_derived",
+        project_id="moz-fx-data-shared-prod",
+        owner="[email protected]",
+        email=[
+            "[email protected]",
+            "[email protected]",
+            "[email protected]",
+        ],
+        date_partition_parameter="submission_date",
+        depends_on_past=False,
+    )
+
     firefox_desktop_derived__newtab_items_daily__v1 = bigquery_etl_query(
         task_id="firefox_desktop_derived__newtab_items_daily__v1",
         destination_table="newtab_items_daily_v1",
@@ -350,6 +365,10 @@
         wait_for_copy_deduplicate_all
     )
 
+    firefox_desktop_derived__newtab_content_items_daily__v1.set_upstream(
+        wait_for_copy_deduplicate_all
+    )
+
     firefox_desktop_derived__newtab_items_daily__v1.set_upstream(
         wait_for_copy_deduplicate_all
     )
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived: newtab_content_items_daily_combined_v1
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived: newtab_content_items_daily_v1
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_combined_v1/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_combined_v1/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_combined_v1/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_combined_v1/metadata.yaml	2025-09-09 01:26:49.000000000 +0000
@@ -0,0 +1,19 @@
+friendly_name: Newtab Content Items Daily Combined
+description: |-
+  A view of the combined (Newtab + Newtab-Content) daily aggregation of newtab content actions on content/items,
+  joined with the latest corpus item details from the corpus_items_current table so that the most current values for
+  the corpus item are available.
+owners:
+- [email protected]
+labels:
+  owner1: lmcfall
+bigquery: null
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+references:
+  view.sql:
+  - moz-fx-data-shared-prod.firefox_desktop_derived.newtab_content_items_daily_v1
+  - moz-fx-data-shared-prod.snowflake_migration_derived.corpus_items_current_v1
+require_column_descriptions: false
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_combined_v1/view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_combined_v1/view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_combined_v1/view.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_combined_v1/view.sql	2025-09-09 01:21:43.000000000 +0000
@@ -0,0 +1,11 @@
+CREATE OR REPLACE VIEW
+  `moz-fx-data-shared-prod.firefox_desktop_derived.newtab_content_items_daily_combined_v1`
+AS
+SELECT
+  content.*,
+  corpus_items.* EXCEPT (corpus_item_id, row_num)
+FROM
+  `moz-fx-data-shared-prod.firefox_desktop_derived.newtab_content_items_daily_v1` AS content
+LEFT OUTER JOIN
+  `moz-fx-data-shared-prod.snowflake_migration_derived.corpus_items_current_v1` AS corpus_items
+  ON content.corpus_item_id = corpus_items.corpus_item_id
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/metadata.yaml	2025-09-09 01:26:49.000000000 +0000
@@ -0,0 +1,34 @@
+friendly_name: Newtab Content Items Daily
+description: |-
+  A daily aggregation of newtab content actions on content/items (example: impressions, clicks, dismissals)
+  for Firefox desktop, partitioned by day.
+owners:
+- [email protected]
+labels:
+  application: newtab
+  incremental: true
+  schedule: daily
+  dag: bqetl_newtab
+  owner1: lmcfall
+  table_type: aggregate
+scheduling:
+  dag_name: bqetl_newtab
+bigquery:
+  time_partitioning:
+    type: day
+    field: submission_date
+    require_partition_filter: true
+    expiration_days: null
+  range_partitioning: null
+  clustering:
+    fields:
+    - channel
+    - country
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+  - workgroup:mozsoc-ml/developers
+  - workgroup:mozsoc-ml/service
+references: {}
+require_column_descriptions: false
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/query.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/query.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/query.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/query.sql	2025-09-09 01:21:43.000000000 +0000
@@ -0,0 +1,194 @@
+WITH newtab_events_unnested AS (
+  SELECT
+    DATE(submission_timestamp) AS submission_date,
+    mozfun.norm.browser_version_info(client_info.app_display_version).major_version AS app_version,
+    normalized_channel AS channel,
+    metrics.string.newtab_locale AS locale,
+    normalized_country_code AS country,
+    metrics.string.newtab_content_surface_id AS newtab_content_surface_id,
+    timestamp AS event_timestamp,
+    category AS event_category,
+    name AS event_name,
+    extra AS event_details,
+  FROM
+    `moz-fx-data-shared-prod.firefox_desktop_stable.newtab_v1`,
+    UNNEST(events)
+  WHERE
+    DATE(submission_timestamp) = @submission_date
+    AND category IN ('pocket')
+    AND name IN ('impression', 'click', 'dismiss')
+    AND mozfun.norm.browser_version_info(
+      client_info.app_display_version
+    ).major_version >= 121 -- the [Pocket team started using Glean](https://github.com/Pocket/dbt-snowflake/pull/459) from this version on. This prevents duplicates for previous releases.
+),
+newtab_flattened_events AS (
+  SELECT
+    submission_date,
+    SAFE_CAST(app_version AS INT64) AS app_version,
+    channel,
+    locale,
+    country,
+    newtab_content_surface_id,
+    event_category,
+    event_name,
+    mozfun.map.get_key(event_details, 'corpus_item_id') AS corpus_item_id,
+    SAFE_CAST(mozfun.map.get_key(event_details, 'position') AS INT64) AS position,
+    SAFE_CAST(mozfun.map.get_key(event_details, 'is_sponsored') AS BOOLEAN) AS is_sponsored,
+    SAFE_CAST(
+      mozfun.map.get_key(event_details, 'is_section_followed') AS BOOLEAN
+    ) AS is_section_followed,
+    mozfun.map.get_key(event_details, 'matches_selected_topic') AS matches_selected_topic,
+    SAFE_CAST(mozfun.map.get_key(event_details, 'received_rank') AS INT64) AS received_rank,
+    mozfun.map.get_key(event_details, 'section') AS section,
+    SAFE_CAST(mozfun.map.get_key(event_details, 'section_position') AS INT64) AS section_position,
+    mozfun.map.get_key(event_details, 'topic') AS topic,
+    IFNULL(mozfun.map.get_key(event_details, 'content_redacted'), 'false') AS content_redacted,
+    NULL AS newtab_content_ping_version
+  FROM
+    newtab_events_unnested
+),
+newtab_daily_agg AS (
+  SELECT
+    submission_date,
+    app_version,
+    channel,
+    country,
+    IFNULL(
+      newtab_content_surface_id,
+      mozfun.newtab.scheduled_surface_id_v1(country, locale)
+    ) AS newtab_content_surface_id,
+    corpus_item_id,
+    position,
+    is_sponsored,
+    is_section_followed,
+    matches_selected_topic,
+    received_rank,
+    section,
+    section_position,
+    topic,
+    content_redacted,
+    newtab_content_ping_version,
+    COUNTIF(event_name = 'impression') AS impression_count,
+    COUNTIF(event_name = 'click') AS click_count,
+    COUNTIF(event_name = 'dismiss') AS dismiss_count
+  FROM
+    newtab_flattened_events
+  WHERE
+    -- Filters out non-redacted events. Redacted events will be counted in the newtab_content ping data.
+    content_redacted = 'false'
+  GROUP BY
+    submission_date,
+    app_version,
+    channel,
+    country,
+    newtab_content_surface_id,
+    corpus_item_id,
+    position,
+    is_sponsored,
+    is_section_followed,
+    matches_selected_topic,
+    received_rank,
+    section,
+    section_position,
+    topic,
+    content_redacted,
+    newtab_content_ping_version
+),
+newtab_content_events_unnested AS (
+  SELECT
+    DATE(submission_timestamp) AS submission_date,
+    normalized_channel AS channel,
+    IFNULL(metrics.string.newtab_content_country, normalized_country_code) AS country,
+    metrics.string.newtab_content_surface_id AS newtab_content_surface_id,
+    timestamp AS event_timestamp,
+    category AS event_category,
+    name AS event_name,
+    extra AS event_details,
+    metrics.quantity.newtab_content_ping_version AS newtab_content_ping_version
+  FROM
+    `moz-fx-data-shared-prod.firefox_desktop.newtab_content`,
+    UNNEST(events)
+  WHERE
+    DATE(submission_timestamp) = @submission_date
+    AND category IN ('newtab_content')
+    AND name IN ('impression', 'click', 'dismiss')
+),
+newtab_content_flattened_events AS (
+  SELECT
+    submission_date,
+    NULL AS app_version,
+    channel,
+    country,
+    newtab_content_surface_id,
+    event_category,
+    event_name,
+    mozfun.map.get_key(event_details, 'corpus_item_id') AS corpus_item_id,
+    SAFE_CAST(mozfun.map.get_key(event_details, 'position') AS INT64) AS position,
+    SAFE_CAST(mozfun.map.get_key(event_details, 'is_sponsored') AS BOOLEAN) AS is_sponsored,
+    SAFE_CAST(
+      mozfun.map.get_key(event_details, 'is_section_followed') AS BOOLEAN
+    ) AS is_section_followed,
+    mozfun.map.get_key(event_details, 'matches_selected_topic') AS matches_selected_topic,
+    SAFE_CAST(mozfun.map.get_key(event_details, 'received_rank') AS INT64) AS received_rank,
+    mozfun.map.get_key(event_details, 'section') AS section,
+    SAFE_CAST(mozfun.map.get_key(event_details, 'section_position') AS INT64) AS section_position,
+    mozfun.map.get_key(event_details, 'topic') AS topic,
+    CAST(NULL AS STRING) AS content_redacted,
+    newtab_content_ping_version
+  FROM
+    newtab_content_events_unnested
+),
+newtab_content_daily_agg AS (
+  SELECT
+    submission_date,
+    app_version,
+    channel,
+    country,
+    newtab_content_surface_id,
+    corpus_item_id,
+    position,
+    is_sponsored,
+    is_section_followed,
+    matches_selected_topic,
+    received_rank,
+    section,
+    section_position,
+    topic,
+    content_redacted,
+    newtab_content_ping_version,
+    COUNTIF(event_name = 'impression') AS impression_count,
+    COUNTIF(event_name = 'click') AS click_count,
+    COUNTIF(event_name = 'dismiss') AS dismiss_count
+  FROM
+    newtab_content_flattened_events
+  WHERE
+    -- Only including events from pings with a version
+    -- to ensure all events coming from this CTE are from the Newtab-Content ping
+    newtab_content_ping_version IS NOT NULL
+  GROUP BY
+    submission_date,
+    app_version,
+    channel,
+    country,
+    newtab_content_surface_id,
+    corpus_item_id,
+    position,
+    is_sponsored,
+    is_section_followed,
+    matches_selected_topic,
+    received_rank,
+    section,
+    section_position,
+    topic,
+    content_redacted,
+    newtab_content_ping_version
+)
+SELECT
+  *
+FROM
+  newtab_daily_agg
+UNION ALL
+SELECT
+  *
+FROM
+  newtab_content_daily_agg
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/schema.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/newtab_content_items_daily_v1/schema.yaml	2025-09-09 01:21:43.000000000 +0000
@@ -0,0 +1,75 @@
+fields:
+- name: submission_date
+  type: DATE
+  mode: NULLABLE
+  description: Date when client action took place
+- name: channel
+  type: STRING
+  mode: NULLABLE
+- name: country
+  type: STRING
+  mode: NULLABLE
+- name: newtab_content_surface_id
+  type: STRING
+  mode: NULLABLE
+- name: corpus_item_id
+  type: STRING
+  mode: NULLABLE
+- name: position
+  type: INTEGER
+  mode: NULLABLE
+  description: The position (0-index) of the pocket tile.
+- name: is_sponsored
+  type: BOOLEAN
+  mode: NULLABLE
+  description: Whether the pocket tile was sponsored (has an ad callback).
+- name: is_section_followed
+  type: BOOLEAN
+  mode: NULLABLE
+  description: If click belongs in a section, if that section is followed
+- name: matches_selected_topic
+  type: STRING
+  mode: NULLABLE
+  description: >
+    Returns value based on if a the topic of the pocket recommendation
+    matches one of the user-selected topic categories
+- name: received_rank
+  type: INTEGER
+  mode: NULLABLE
+  description: >
+    The rank or order of the recommendation at the time it was sent to
+    the client.
+- name: section
+  type: STRING
+  mode: NULLABLE
+  description: If click belongs in a section, the name of the section
+- name: section_position
+  type: INTEGER
+  mode: NULLABLE
+  description: If click belongs in a section, the numberic position of the section
+- name: topic
+  type: STRING
+  mode: NULLABLE
+  description: The topic of the recommendation. Like "entertainment".
+- name: content_redacted
+  type: STRING
+  mode: NULLABLE
+  description: Are content details sent separately in the newtab_content ping
+- name: newtab_content_ping_version
+  type: INTEGER
+  mode: NULLABLE
+- name: impression_count
+  type: INTEGER
+  mode: NULLABLE
+  description: Count of articles impressed on Newtab
+- name: click_count
+  type: INTEGER
+  mode: NULLABLE
+  description: Count of articles clicked on Newtab
+- name: dismiss_count
+  type: INTEGER
+  mode: NULLABLE
+  description: Count of articles dismissed on Newtab
+- name: app_version
+  type: INTEGER
+  mode: NULLABLE
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/snowflake_migration_derived/corpus_items_current_v1/view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/snowflake_migration_derived/corpus_items_current_v1/view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/snowflake_migration_derived/corpus_items_current_v1/view.sql	2025-09-09 01:27:15.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/snowflake_migration_derived/corpus_items_current_v1/view.sql	2025-09-09 01:21:44.000000000 +0000
@@ -5,7 +5,7 @@
   SELECT
     approved_corpus_item_external_id AS corpus_item_id,
     title,
-    url,
+    url AS recommendation_url,
     authors,
     publisher,
     reviewed_corpus_item_updated_at AS corpus_item_updated_at,

Link to full diff

Copy link
Contributor

@gkatre gkatre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiamMcFall looks good. If you have done some data validations checks then lets deploy.

@LiamMcFall LiamMcFall added this pull request to the merge queue Sep 9, 2025
Merged via the queue into main with commit aef717c Sep 9, 2025
22 checks passed
@LiamMcFall LiamMcFall deleted the newtab-content_dev branch September 9, 2025 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants