diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/images/image-1.png b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/classic-alerts/images/image-1.png similarity index 100% rename from docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/images/image-1.png rename to docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/classic-alerts/images/image-1.png diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/images/image.png b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/classic-alerts/images/image.png similarity index 100% rename from docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/images/image.png rename to docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/classic-alerts/images/image.png diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/classic-alerts/index.md b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/classic-alerts/index.md new file mode 100644 index 0000000000..c4788f95fc --- /dev/null +++ b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/classic-alerts/index.md @@ -0,0 +1,31 @@ +--- +title: "Setting up data quality alerts" +date: "2021-01-14" +sidebar_label: "Classic alerts" +sidebar_position: 2 +--- + +## Overview + +Snowplow can send two types of alerts to help you monitor Failed Events: + +- **New failed event:** receive an alert within 10 minutes of a new type of event failure being detected on your pipeline. +- **Failed event digest**: receive a daily digest of all Failed Event activity in the previous 48-hour period. + +## Pre-requisites + +To receive alerts you must have the Failed Events monitoring feature switched on in the Snowplow BDP console. + +## Subscribing to alerts + +- Login to Snowplow BDP console +- Locate the pipeline you wish to set up alerts for in the left-hand navigation +- Click on the `Configuration` tab, then the `Pipeline alerts` section + +![](images/image.png) + +- Click `Manage` for the alert you wish to subscribe to +- Add one or more email addresses by typing them into the input and clicking `Add recipient` +- Once you have added all recipients, click `Save Changes` + +![](images/image-1.png) diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_connect_slack.png b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_connect_slack.png new file mode 100644 index 0000000000..1c81e552fc Binary files /dev/null and b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_connect_slack.png differ diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_create_alert.png b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_create_alert.png new file mode 100644 index 0000000000..07898b1a91 Binary files /dev/null and b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_create_alert.png differ diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_create_email_alert.png b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_create_email_alert.png new file mode 100644 index 0000000000..19949cca95 Binary files /dev/null and b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_create_email_alert.png differ diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_create_slack_alert.png b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_create_slack_alert.png new file mode 100644 index 0000000000..fa504b4a02 Binary files /dev/null and b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_create_slack_alert.png differ diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_filters.png b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_filters.png new file mode 100644 index 0000000000..b4bc8a8c44 Binary files /dev/null and b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_filters.png differ diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_slack.png b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_slack.png new file mode 100644 index 0000000000..a7d8500ffb Binary files /dev/null and b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_slack.png differ diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_slack_confirmation.png b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_slack_confirmation.png new file mode 100644 index 0000000000..0e0b3b6cea Binary files /dev/null and b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/images/dq_slack_confirmation.png differ diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/index.md b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/index.md new file mode 100644 index 0000000000..618c3ce7d1 --- /dev/null +++ b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/index.md @@ -0,0 +1,77 @@ +--- +title: "Creating alerts" +sidebar_position: 1 +--- + +Set up alerts to receive notifications when failed events occur in your data pipeline. + +## Before you start + +- Access to the Data Quality Dashboard + +## Create an alert + +1. Navigate to **Data Quality** in the left sidebar +2. Click **Manage alerts** in the top-right corner +3. Click **Create alert** + +![Create alert form](images/dq_create_alert.png) + +### Configure destination + +Choose how you want to receive notifications: + +#### Email notifications + +1. Select **Email** as destination +2. Enter alert name (e.g., "mobile-app") +3. Add recipient email addresses +4. Click **Add filters** to configure triggers + +![Email destination configuration](images/dq_create_email_alert.png) + +#### Slack notifications + +1. Select **Slack** as destination +2. Enter alert name (e.g., "web-app") +3. Select Slack channel from dropdown +4. Click **Add filters** to configure triggers + +![Slack destination configuration](images/dq_create_slack_alert.png) + +When no active Slack integration a `Connect with Slack` button will apear instead of the list of channels. + +![Connect to Slack](images/dq_connect_slack.png) + +A Slack consent screen will appear + +![Slack consent](images/dq_slack.png) + +Once a Slack alert is configured you will see a confirmation notification in the selected Slack channel + +![Slack confirmation](images/dq_slack_confirmation.png) + +### Set up filters + +Configure when alerts should trigger: + +1. **Issue types**: Select ValidationError, ResolutionError, or both +2. **Data structures**: Choose specific data structures (all versions will apply) +3. **App IDs**: Filter by application identifiers + +![Filter configuration](images/dq_filters.png) + +### Complete setup + +1. Review your configuration +2. Click **Confirm** to create the alert +3. Your alert will appear in the alerts list + +## Alert frequency + +Alerts are checked every 10 minutes. You'll receive notifications when new failed events match your filter criteria. + +## Next steps + +- [Manage existing alerts](/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/managing-alerts/index.md) +- [Explore failed events](/docs/data-product-studio/data-quality/failed-events/exploring-failed-events/index.md) diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/images/dq_manage_alerts_button.png b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/images/dq_manage_alerts_button.png new file mode 100644 index 0000000000..8dd7429657 Binary files /dev/null and b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/images/dq_manage_alerts_button.png differ diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/index.md b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/index.md new file mode 100644 index 0000000000..69d5ca3182 --- /dev/null +++ b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/index.md @@ -0,0 +1,37 @@ +--- +title: "Data Quality alerts" +sidebar_position: 1 +--- + +Failed events alerts automatically notify you when [failed events](/docs/fundamentals/failed-events/index.md) occur in your data pipeline. Set up alerts to receive notifications via email or Slack when validation errors, resolution errors, or other data quality issues arise. + +## How alerts work + +The alerting system monitors your failed events and sends notifications based on the filters you configure. Alerts are checked every 10 minutes and sent to your specified destinations when matching failed events are detected. + +## Alert destinations + +- **Email**: Send notifications to one or more email addresses +- **Slack**: Send notifications to specific Slack channels + +## What you can filter on + +Configure alerts to trigger only for specific types of failed events: + +- **Issue types**: ValidationError, ResolutionError +- **Data structures**: Filter by specific schemas or event types +- **App IDs**: Filter by application identifiers + +## Getting started + +1. Navigate to the Data Quality Dashboard +2. View your failed events overview +3. Click **Manage alerts** to set up notifications +4. Create and configure your first alert + +![Data Quality Dashboard overview](images/dq_manage_alerts_button.png) + +## Next steps + +- [Create your first alert](/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/index.md) +- [Manage existing alerts](/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/managing-alerts/index.md) diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/managing-alerts/images/dq_list_alerts.png b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/managing-alerts/images/dq_list_alerts.png new file mode 100644 index 0000000000..de97de41e8 Binary files /dev/null and b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/managing-alerts/images/dq_list_alerts.png differ diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/managing-alerts/index.md b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/managing-alerts/index.md new file mode 100644 index 0000000000..c003efeef2 --- /dev/null +++ b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/managing-alerts/index.md @@ -0,0 +1,39 @@ +--- +title: "Managing alerts" +sidebar_position: 2 +--- + +Edit, delete, or review existing failed events alerts. + +## View alerts + +1. Navigate to **Data Quality** in the left sidebar +2. Click **Manage alerts** in the top-right corner +3. View all configured alerts with their destinations + +![Manage alerts interface](images/dq_list_alerts.png) + +## Edit an alert + +1. Click the arrow next to the alert name +2. Modify destination, filters, or recipients +3. Click **Save** to update + +## Delete an alert + +1. Click the arrow next to the alert name +2. Click on the three dots button +3. Click **Delete** +4. Confirm deletion + +### Multiple notifications + +Alerts trigger when new failed events match your filters. If you receive multiple notifications, check if: +- Failed events are occurring frequently +- Filter criteria are too broad +- Multiple alerts have overlapping configurations + +## Next steps + +- [Create additional alerts](/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/creating-alerts/index.md) +- [Explore failed events](/docs/data-product-studio/data-quality/failed-events/exploring-failed-events/index.md) diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/index.md b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/index.md index f661a0c35a..bd2e01fd45 100644 --- a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/index.md +++ b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/index.md @@ -1,31 +1,27 @@ --- title: "Setting up data quality alerts" -date: "2021-01-14" -sidebar_label: "Set up alerts" +date: "2025-01-14" +sidebar_label: "Alerts" sidebar_position: 2500 --- ## Overview -Snowplow can send two types of alerts to help you monitor Failed Events: +Snowplow can alert you when a new failed event has occurred. There are two different implementations available to choose from. -- **New failed event:** receive an alert within 10 minutes of a new type of event failure being detected on your pipeline. -- **Failed event digest**: receive a daily digest of all Failed Event activity in the previous 48-hour period. +### [Data quality alerts](/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/data-quality-alerts/index.md) +Driven by the [Data Quality dashboard](/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/index.md#data-quality-dashboard) deployment -## Pre-requisites +### [Classic alerts](/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/alerts/classic-alerts/index.md) +Driven by the [Snowplow infrastructure](/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/index.md#default-view), cheaper to run -To receive alerts you must have the Failed Events monitoring feature switched on in the Snowplow BDP console. +### Feature comparison -## Subscribing to alerts - -- Login to Snowplow BDP console -- Locate the pipeline you wish to set up alerts for in the left-hand navigation -- Click on the `Configuration` tab, then the `Pipeline alerts` section - -![](images/image.png) - -- Click `Manage` for the alert you wish to subscribe to -- Add one or more email addresses by typing them into the input and clicking `Add recipient` -- Once you have added all recipients, click `Save Changes` - -![](images/image-1.png) +| Feature | Classic alerts | Data quality alerts | +| :------ | :------------: | :-----------------: | +| Can alert on new failed events | ✅ | ✅ | +| Can send a digest of failed events for a week | ✅ | ✅ | +| Notify via Email | ✅ | ✅ | +| Notify via Slack | ❌ | ✅ | +| Filters | ❌ | ✅ | +| Does not affect pipeline cost | ✅ | ❌ | diff --git a/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/troubleshooting/index.md b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/troubleshooting/index.md new file mode 100644 index 0000000000..9233f279ff --- /dev/null +++ b/docs/data-product-studio/data-quality/failed-events/monitoring-failed-events/troubleshooting/index.md @@ -0,0 +1,198 @@ +--- +title: "Troubleshooting data quality dashboard" +sidebar_position: 4 +sidebar_custom_props: + offerings: + - bdp +sidebar_label: "Troubleshooting" +--- + +This guide helps you troubleshoot common errors when using the data quality dashboard with your warehouse. + +## Missing warehouse permissions + +When deploying a loader with the data quality add-on (API), you may encounter permission errors that prevent the dashboard from querying your warehouse. + +### BigQuery: missing `bigquery.jobs.create` permission {#bigquery-permissions} + +#### Error code range +- `21xxx` + +#### Error description +`Missing permission 'bigquery.jobs.create' on Bigquery...` + +#### Root cause +- The service account lacks the required permission to create BigQuery jobs +- This permission can be granted via the `roles/bigquery.jobUser` role + +#### How to diagnose +Check if your service account has the required role: + +```bash +gcloud projects get-iam-policy \ + --flatten="bindings[].members" \ + --filter="bindings.members:" \ + --format="table(bindings.role)" +``` + +#### Fix +Grant the required role to your service account (recommended): + +```bash +gcloud projects add-iam-policy-binding \ + --member="serviceAccount:" \ + --role="roles/bigquery.jobUser" +``` + +Alternatively, if you need more granular control, create a custom role with only the `bigquery.jobs.create` permission: + +```bash +gcloud iam roles create customBigQueryJobCreator \ + --project= \ + --title="BigQuery Job Creator" \ + --description="Create BigQuery jobs for Data Quality Dashboard" \ + --permissions="bigquery.jobs.create" + +gcloud projects add-iam-policy-binding \ + --member="serviceAccount:" \ + --role="projects//roles/customBigQueryJobCreator" +``` + +#### Helpful links +- [BigQuery IAM roles documentation](https://cloud.google.com/bigquery/docs/access-control#bigquery) +- [Service account permissions](https://cloud.google.com/iam/docs/service-accounts) + +### Snowflake: missing `USAGE` privilege {#snowflake-permissions} + +#### Error code range +- `11xxx` + +#### Error description +`Missing required privileges on Snowflake: No active warehouse selected in the current session...` + +#### Root cause +- The role lacks `USAGE` privilege on the active warehouse +- Without this privilege, queries cannot be executed + +#### How to diagnose +Verify current warehouse privileges for your role: + +```sql +SHOW GRANTS ON WAREHOUSE ; +SHOW GRANTS TO ROLE ; +``` + +#### Fix +Grant the `USAGE` privilege on the warehouse: + +```sql +GRANT USAGE ON WAREHOUSE TO ROLE ; +``` + +Ensure the grant is properly applied: + +```sql +-- Verify the grant +SHOW GRANTS ON WAREHOUSE ; +``` + +#### Helpful links +- [Snowflake warehouse privileges](https://docs.snowflake.com/en/user-guide/security-access-control-privileges#warehouse-privileges) +- [Snowflake access control](https://docs.snowflake.com/en/user-guide/security-access-control-overview) + +## Query timeouts + +Long-running queries or resource pool exhaustion can cause the data quality dashboard to time out when fetching failed events. + +### BigQuery and Snowflake query timeouts {#query-timeouts} + +#### Error code range +- `12xxx` +- `22xxx` + +#### Error description +`Query exceeded timeout` or `Query execution time limit exceeded` + +#### Root cause +- Large volume of failed events requiring extensive scanning +- Warehouse resource pool exhaustion +- Concurrent query limits reached + +#### How to diagnose + +**BigQuery:** +```sql +-- Check recent query performance +SELECT + job_id, + user_email, + total_slot_ms, + total_bytes_processed, + TIMESTAMP_DIFF(end_time, start_time, SECOND) as duration_seconds +FROM `.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` +WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) + AND state = 'DONE' + AND statement_type = 'SELECT' +ORDER BY total_slot_ms DESC +LIMIT 10; +``` + +**Snowflake:** +```sql +-- Check query history +SELECT + query_id, + query_text, + warehouse_name, + execution_time, + queued_overload_time, + bytes_scanned +FROM table(information_schema.query_history()) +WHERE start_time >= DATEADD('day', -1, CURRENT_TIMESTAMP()) + AND execution_status = 'SUCCESS' +ORDER BY execution_time DESC +LIMIT 10; +``` + +#### Fix + +**Reduce query scope:** +- If using API use smaller time windows (e.g., "Last hour" or "Last day" instead of "Last 30 days") +- If using Console you can change to legacy failed events based on telemetry data +- Query specific error types or schemas when investigating issues + +**Optimize warehouse performance:** +- Review your warehouse configuration and query patterns +- Consider implementing partitioning, clustering, or other optimization strategies +- Monitor resource usage, and adjust warehouse size as needed + +For detailed optimization guidance, refer to your warehouse documentation: +- [BigQuery query optimization best practices](https://cloud.google.com/bigquery/docs/best-practices-performance-overview) +- [Snowflake query performance optimization](https://docs.snowflake.com/en/user-guide/performance-query-optimization) + +#### Helpful links +- [BigQuery query optimization](https://cloud.google.com/bigquery/docs/best-practices-performance-overview) +- [Snowflake query performance](https://docs.snowflake.com/en/user-guide/performance-query-optimization) +- [BigQuery clustering and partitioning](https://cloud.google.com/bigquery/docs/clustered-tables) +- [Snowflake clustering keys](https://docs.snowflake.com/en/user-guide/tables-clustering-keys) + +## Additional considerations + +### API behavior + +- **Missing permissions**: Returns HTTP 400 with remediation instructions displayed in the UI + +### Prevention tips + +1. **Regular maintenance**: + - Monitor table sizes and query performance + - Review and optimize clustering/partitioning strategies + +2. **Proactive monitoring**: + - Monitor query execution times + - Track failed events volume trends + +3. **Access control**: + - Document required permissions for all team members + - Use least-privilege principles + - Regularly audit access permissions