You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-configure your first dataset to enable observability.
13
+
In this quickstart, you will:
14
+
-Create a Soda Cloud account
15
+
-Connect a data source
16
+
-Configure your first dataset to enable observability.
17
17
18
18
## Step 1: Create a Soda Cloud Account
19
-
1. Go to <ahref="https://cloud.soda.io/signup?utm_source=docs"target="_blank"> cloud.soda.io</a> and create a Soda Cloud account.
20
-
If you already have a Soda account, log in.
21
-
2. By default, Soda prepares a Soda-hosted agent for all newly-created accounts. However, if you are an Admin in an existing Soda Cloud account and wish to use a Soda-hosted agent, navigate to **your avatar** > **Organization Settings**. In the **Organization** tab, click the checkbox to **Enable Soda-hosted Agent**.
22
-
3. Navigate to **your avatar** > **Data Sources**, then access the **Agents** tab. Notice your out-of-the-box Soda-hosted agent that is up and running. <br />
19
+
1. Go to <ahref="https://cloud.soda.io/signup?utm_source=docs"target="_blank"> cloud.soda.io</a> and sign up for a Soda Cloud account. If you already have an account, log in.
20
+
2. By default, Soda creates a Soda-hosted Agent for all new accounts. You can think of an Agent as the bridge between your data sources and Soda Cloud. A Soda-hosted Agent runs in Soda's cloud and securely connects to your data sources to scan for data quality issues.
21
+
3. If you are an admin and prefer to deploy your own agent, you can configure a self-hosted agent:
22
+
23
+
- In Soda Cloud, go to **your avatar** > **Agents**
24
+
- Click **New Soda Agent** and follow the setup instructions
> 1.**Soda-hosted Agent:** This is an out-of-the-box, ready-to-use agent that Soda provides and manages for you. It's the quickest way to get started with Soda as it requires no installation or deployment. It supports connections to specific data sources like BigQuery, Databricks SQL, MS SQL Server, MySQL, PostgreSQL, Redshift, and Snowflake. [Soda-hosted agent (missing)](#)
32
+
> 2.**Self-hosted Agent:** This is a version of the agent that you deploy in your own Kubernetes cluster within your cloud environment (like AWS, Azure, or Google Cloud). It gives you more control and supports a wider range of data sources. [Self-hosted agent (missing)](#)
33
+
>
34
+
> A Soda Agent is essentially Soda Library (the core scanning technology) packaged as a containerized application that runs in Kubernetes. It acts as the bridge between your data sources and Soda Cloud, allowing users to:
35
+
> - Connect to data sources securely
36
+
> - Run scans to check data quality
37
+
> - Create and manage no-code checks directly in the Soda Cloud interface
38
+
>
39
+
> The agent only sends metadata (not your actual data) to Soda Cloud, keeping your data secure within your environment. Soda [Agent basic concepts (missing)](#)
40
+
25
41
## Step 2: Add a Data Source
26
-
1. In your Soda Cloud account, navigate to **your avatar** > **Data Sources**.
27
-
2. Click **New Data Source**, then follow the guided steps to create a new data source (e.g., PostgreSQL, BigQuery).
28
-
Enter the required connection details (host, port, database name, credentials).
29
-
Refer to the section - **Attributes** below for insight into the values to enter in the fields and editing panels in the guided steps.
42
+
1. In Soda Cloud, go to **your avatar** > **Data Sources**.
43
+
2. Click **New Data Source**, then follow the guided steps to create the connection.
44
+
Use the table below to understand what each field means and how to complete it:
30
45
31
46
#### Attributes
32
47
@@ -39,12 +54,10 @@ In this Quickstart, you'll:
39
54
| Custom Cron Expression | (Optional) Write your own <ahref="https://en.wikipedia.org/wiki/Cron"target="_blank">cron expression</a> to define the schedule Soda Cloud uses to run scans. |
40
55
| Anomaly Dashboard Scan Schedule <br />{:height="150px" width="150px"} <br /> | Provide the scan frequency details Soda Cloud uses to execute a daily scan to automatically detect anomalies for the anomaly dashboard. |
41
56
57
+
{:start="3"}
58
+
3. Complete the connection configuration. These settings are specific to each data source (PostgreSQL, MySQL, Snowflake, etc) and usually include connection details such as host, port, credentials, and database name.
42
59
43
-
3. Enter values in the fields to provide the connection configurations Soda Cloud needs to be able to access the data in the data source. Connection configurations are data source-specific and include values for things such as a database's host and access credentials.
44
-
45
-
Soda hosts agents in a secure environment in Amazon AWS. As a SOC 2 Type 2 certified business, Soda responsibly manages Soda-hosted agents to ensure that they remain private, secure, and independent of all other hosted agents. See [Data security and privacy]({% link soda/data-privacy.md %}#using-a-soda-hosted-agent) for details.
46
-
47
-
Use the following data source-specific connection configuration pages to populate the connection fields in Soda Cloud.
60
+
Use the appropriate guide below to complete the connection:
48
61
*[Connect to BigQuery]({% link soda/connect-bigquery.md %})
49
62
*[Connect to Databricks SQL]({% link soda/connect-spark.md %}#connect-to-spark-for-databricks-sql)
50
63
*[Connect to MS SQL Server]({% link soda/connect-mssql.md %})
@@ -53,27 +66,64 @@ Use the following data source-specific connection configuration pages to populat
53
66
*[Connect to Redshift]({% link soda/connect-redshift.md %})
54
67
*[Connect to Snowflake]({% link soda/connect-snowflake.md %})
55
68
56
-
💡 Already have data source connected to a self-hosted agent? You can [migrate]({% link soda/upgrade.md %}#migrate-a-data-source-from-a-self-hosted-to-a-soda-hosted-agent) a data source to a Soda-hosted agent.
57
69
70
+
## Step 3: Configure Dataset Discovery
71
+
Dataset discovery captures metadata about each dataset, including its schema and the data types of each column.
72
+
73
+
- In Step 3 of the guided workflow, specify the datasets you want to profile. Because dataset discovery can be resource-intensive, only include the datasets you need for observability.
74
+
See [Compute consumption and cost considerations]({% link soda-cl/profile.md %}#compute-consumption-and-cost-considerations) for more detail.
58
75
59
-
## Step 3: Select and Configure a Dataset
76
+
## Step 4: Add Column Profiling
77
+
Column profiling extracts metrics such as the mean, minimum, and maximum values in a column, and the number of missing values.
60
78
61
-
1. In the editing panel of **4. Profile**, use the include and exclude syntax to indicate the datasets for which Soda must profile and prepare an anomaly dashboard. The default syntax in the editing panel instructs Soda to profile every column of every dataset in the data source, and, superfluously, all datasets with names that begin with prod. The `%` is a wildcard character. See [Add column profiling]({% link soda-cl/profile.md %}#add-column-profiling) for more detail on profiling syntax.
79
+
- In Step 4 of the guided workflow, use include/exclude patterns to define which columns Soda should profile. Soda uses this information to power the anomaly dashboard. Learn more about [column profiling syntax]({% link soda-cl/profile.md %}#add-column-profiling).
62
80
63
81
```yaml
64
-
profile columns:
65
-
columns:
66
-
- "%.%"# Includes all your datasets
67
-
- prod% # Includes all datasets that begin with 'prod'
82
+
profile columns:
83
+
columns:
84
+
- "%.%"# Includes all columns of all datasets
85
+
- "prod%.%"# Includes all columns of all datasets that begin with 'prod'
68
86
```
69
87
70
-
2. Continue the remaining steps to add your new data source, then **Test Connection**, if you wish, and **Save** the data source configuration.
88
+
## Step 5: Add Automated Monitoring Checks
89
+
In Step 5 of the guided workflow, define which datasets should have automated checks applied for anomaly scores and schema evolution.
90
+
91
+
> If you are using the early access anomaly dashboard, this step is not required. Soda automatically enables monitoring in the > dashboard. See [Anomaly Dashboard]({% link soda-cloud/anomaly-dashboard.md %}) for details.
92
+
93
+
Use include/exclude filters to target specific datasets. Read more about [automated monitoring configuration]({% link soda-cl/automated-monitoring.md %}).
94
+
95
+
```yaml
96
+
automated monitoring:
97
+
datasets:
98
+
- include prod% # Includes all the datasets that begin with 'prod'
99
+
- exclude test% # Excludes all the datasets that begin with 'test'
100
+
```
101
+
102
+
## Step 6: Assing a Data Source and Dataset Owner
103
+
In the step 6 of the guided workflow, assign responsibility for maintaining the data source and each dataset.
104
+
105
+
- **Data Source Owner:** Manages the connection settings and scan configurations for the data source.
106
+
- **Dataset Owner:** Becomes the default owner of each dataset for monitoring and collaboration.
107
+
108
+
For more details, see [Roles and rights in Soda Cloud]({% link soda-cloud/roles-global.md %}).
109
+
110
+
## Step 7: Test Connection and Save
111
+
- Click **Test Connection** to verify your configuration.
112
+
- Click **Save** to start profiling the selected datasets.
113
+
114
+
Once saved, Soda runs a first scan using your profiling settings. This initial scan provides baseline measurements that Soda uses to begin learning patterns and identifying anomalies.
115
+
116
+
## Step 8: View Metric Monitor Results
117
+
1. Go to the **Datasets** page in Soda Cloud.
118
+
2. Select a dataset you included in profiling.
119
+
3. Open the **Metric Monitors** tab to view automatically detected issues.
3. Soda begins profiling the datasets according to your **Profile** configuration while the algorithm uses the first measurements collected from a scan of your data to begin the work of identifying patterns in the data. You can navigate to the **Dataset** page for a dataset you included in profiling. Click the **Monitors** tab to view the issues Soda automatically detected.
123
+
### 🎉 Congratulations! You’ve set up your dataset and enabled observability.
73
124
74
-
### Congratulations! You’ve set up your dataset and enabled observability.
125
+
## What's Next?
126
+
Now that your first dataset is configured and observability is active, try:
75
127
76
-
#### What's Next?
77
-
Now that you’ve set up your first dataset and enabled observability, try:
78
-
[Exploring detailed metrics in the dashboard.]({% link observability/anomaly-dashboard.md %})
79
-
[Setting up notifications for anomaly detection.]({% link observability/set-up-alerts.md %})
128
+
- [Explore detailed metrics in the anomaly dashboard]({% link observability/anomaly-dashboard.md %})
129
+
- [Set up alerts for anomaly detection]({% link observability/set-up-alerts.md %})
0 commit comments