Skip to content

Conversation

mmiermans
Copy link
Collaborator

@mmiermans mmiermans commented Aug 12, 2025

Description

Allow ML to look back further into the history of New Tab engagement data stored on GCS for Merino.

Related Tickets & Documents

Reviewer, please follow this checklist

@mmiermans mmiermans marked this pull request as ready for review August 12, 2025 22:32
@mmiermans mmiermans changed the title Increase expiration of Merino GCS data Feat: increase expiration of Merino GCS data Aug 12, 2025
@mmiermans mmiermans changed the title Feat: increase expiration of Merino GCS data feat(HNT-929): increase expiration of Merino GCS data Aug 12, 2025
@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot
Copy link

Integration report for "Merge branch 'main' into mathijs/increase-merino-gcs-expiration"

sql.diff

Click to expand!
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/dags/bqetl_merino_newtab_extract_to_gcs.py /tmp/workspace/generated-sql/dags/bqetl_merino_newtab_extract_to_gcs.py
--- /tmp/workspace/main-generated-sql/dags/bqetl_merino_newtab_extract_to_gcs.py	2025-08-13 16:39:24.000000000 +0000
+++ /tmp/workspace/generated-sql/dags/bqetl_merino_newtab_extract_to_gcs.py	2025-08-13 16:40:26.000000000 +0000
@@ -100,7 +100,7 @@
             "--source-table=newtab_merino_extract_v2",
             "--destination-bucket=merino-airflow-data-prodpy",
             "--destination-prefix=newtab-merino-exports/engagement",
-            "--deletion-days-old=3",
+            "--deletion-days-old=90",
         ],
         image="gcr.io/moz-fx-data-airflow-prod-88e0/bigquery-etl:latest",
         owner="[email protected]",
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/dags/bqetl_merino_newtab_priors_to_gcs.py /tmp/workspace/generated-sql/dags/bqetl_merino_newtab_priors_to_gcs.py
--- /tmp/workspace/main-generated-sql/dags/bqetl_merino_newtab_priors_to_gcs.py	2025-08-13 16:39:24.000000000 +0000
+++ /tmp/workspace/generated-sql/dags/bqetl_merino_newtab_priors_to_gcs.py	2025-08-13 16:40:26.000000000 +0000
@@ -113,7 +113,7 @@
             "--source-table=newtab_merino_priors_v1",
             "--destination-bucket=merino-airflow-data-prodpy",
             "--destination-prefix=newtab-merino-exports/priors",
-            "--deletion-days-old=3",
+            "--deletion-days-old=90",
         ],
         image="gcr.io/moz-fx-data-airflow-prod-88e0/bigquery-etl:latest",
         owner="[email protected]",
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_extract_to_gcs_v2/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_extract_to_gcs_v2/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_extract_to_gcs_v2/metadata.yaml	2025-08-13 16:35:29.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_extract_to_gcs_v2/metadata.yaml	2025-08-13 16:33:47.000000000 +0000
@@ -17,7 +17,7 @@
   - --source-table=newtab_merino_extract_v2
   - --destination-bucket=merino-airflow-data-prodpy
   - --destination-prefix=newtab-merino-exports/engagement
-  - --deletion-days-old=3
+  - --deletion-days-old=90
   referenced_tables:
   - - moz-fx-data-shared-prod
     - telemetry_derived
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_priors_to_gcs_v1/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_priors_to_gcs_v1/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_priors_to_gcs_v1/metadata.yaml	2025-08-13 16:35:29.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/telemetry_derived/newtab_merino_priors_to_gcs_v1/metadata.yaml	2025-08-13 16:33:48.000000000 +0000
@@ -18,7 +18,7 @@
   - --source-table=newtab_merino_priors_v1
   - --destination-bucket=merino-airflow-data-prodpy
   - --destination-prefix=newtab-merino-exports/priors
-  - --deletion-days-old=3
+  - --deletion-days-old=90
   referenced_tables:
   - - moz-fx-data-shared-prod
     - telemetry_derived

Link to full diff

@mmiermans mmiermans requested a review from rolf-moz August 13, 2025 17:09
@mmiermans mmiermans added this pull request to the merge queue Aug 13, 2025
Merged via the queue into main with commit 34bb529 Aug 13, 2025
22 checks passed
@mmiermans mmiermans deleted the mathijs/increase-merino-gcs-expiration branch August 13, 2025 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants