You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: _includes/agent-practice-datasource.md
+2
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
If you wish to try creating a new data source in Soda Cloud using the agent you deployed, you can use the following command to create a PostgreSQL warehouse containing example data from the <ahref="https://data.cityofnewyork.us/Transportation/Bus-Breakdown-and-Delays/ez4e-fazm"target="_blank">NYC Bus Breakdowns and Delay Dataset</a>.
2
2
3
3
From the command-line, copy+paste and run the following to create the data source as a pod on your new cluster.
4
+
{% include code-header.html %}
4
5
```shell
5
6
cat <<EOF | kubectl apply -n soda-agent -f -
6
7
---
@@ -45,6 +46,7 @@ service/nybusbreakdowns created
45
46
46
47
<br />
47
48
Once the pod of practice data is running, you can use the following configuration details when you add a data source in Soda Cloud, in [step 2]({% link soda-cloud/add-datasource.md %}#2-connect-the-data-source), **Connect the Data Source**.
Copy file name to clipboardexpand all lines: _includes/expect-one-result.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
Be aware that a check that contains one or more alert configurations only ever yields a *single* check result; one check yields one check result. If your check triggers both a `warn` and a `fail`, the check result only displays the more severe, failed check result.
2
2
3
3
Using the following example, Soda Core, during a scan, discovers that the data in the dataset triggers both alerts, but the check result is still `Only 1 warning`. Nonetheless, the results in the CLI still display both alerts as having both triggered a `warn`.
4
-
4
+
{% include code-header.html %}
5
5
```yaml
6
6
checks for dim_employee:
7
7
- schema:
@@ -23,7 +23,7 @@ Sending results to Soda Cloud
23
23
```
24
24
25
25
Adding to the example check above, the check in the example below data triggers both `warn` alerts and the `fail` alert, but only returns a single check result, the more severe `Oops! 1 failures.`
Copy file name to clipboardexpand all lines: _includes/foreach-config.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ Add a **for each** section to your checks YAML file to specify a list of checks
4
4
2. Nested under the section header, add two nested keys, one for `datasets` and one for `checks`.
5
5
3. Nested under `datasets`, add a list of datasets against which to run the checks. Refer to the example below that illustrates how to use `include` and `exclude` configurations and wildcard characters {% raw %} (%) {% endraw %}.
6
6
4. Nested under `checks`, write the checks you wish to execute against all the datasets listed under `datasets`.
Copy file name to clipboardexpand all lines: _includes/in-check-filters.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
Add a filter to a check to apply conditions that specify a portion of the data against which Soda executes the check. For example, you may wish to use an in-check filter to support a use case in which "Column X must be filled in for all rows that have value Y in column Z".
2
2
3
3
Add a filter as a nested key:value pair, as in the following example which filters the check results to display only those rows with a value of 81 or greater and which contain `11` in the `sales_territory_key` column. You cannot use a variable to specify an in-check filter.
4
-
4
+
{% include code-header.html %}
5
5
```yaml
6
6
checks for dim_employee:
7
7
- max(vacation_hours) < 80:
@@ -10,7 +10,7 @@ checks for dim_employee:
10
10
```
11
11
12
12
You can use `AND` or `OR` to add multiple filter conditions to a filter key:value pair to further refine your results, as in the following example.
13
-
13
+
{% include code-header.html %}
14
14
```yaml
15
15
checks for dim_employee:
16
16
- max(vacation_hours) < 80:
@@ -19,7 +19,7 @@ checks for dim_employee:
19
19
```
20
20
21
21
To improve the readability of multiple filters in a check, consider adding filters as separate line items, as per the following example.
To confirm that you have correctly configured the connection details for the data source(s) in your configuration YAML file, use the `test-connection` command. If you wish, add a `-V` option to the command to returns results in verbose mode in the CLI.
4
-
4
+
{% include code-header.html %}
5
5
```shell
6
6
soda test-connection -d my_datasource -c configuration.yml -V
Use an anomaly score check to automatically discover anomalies in your time-series data. <br>
14
14
*Requires Soda Cloud and Soda Core Scientific.*<br />
15
-
15
+
{% include code-header.html %}
16
16
```yaml
17
17
checks for dim_customer:
18
18
- anomaly score for row_count < default
@@ -54,7 +54,7 @@ Refer to [Troubleshoot Soda Core Scientific installation](#troubleshoot-soda-cor
54
54
## Define an anomaly score check
55
55
56
56
The following example demonstrates how to use the anomaly score for the `row_count` metric in a check. You can use any [numeric]({% link soda-cl/numeric-metrics.md %}), [missing]({% link soda-cl/missing-metrics.md %}), or [validity]({% link soda-cl/validity-metrics.md %}) metric in lieu of `row_count`.
57
-
57
+
{% include code-header.html %}
58
58
```yaml
59
59
checks for dim_customer:
60
60
- anomaly score for row_count < default
@@ -66,13 +66,14 @@ checks for dim_customer:
66
66
<br />
67
67
You can use any [numeric]({% link soda-cl/numeric-metrics.md %}), [missing]({% link soda-cl/missing-metrics.md %}), or [validity]({% link soda-cl/validity-metrics.md %}) metric in anomaly score checks. The following example detects anomalies for the average of `order_price` in an `orders` dataset.
68
68
69
+
{% include code-header.html %}
69
70
```yaml
70
71
checks for orders:
71
72
- anomaly score for avg(order_price) < default
72
73
```
73
74
74
75
The following example detects anomalies for the count of missing values in the `id` column.
75
-
76
+
{% include code-header.html %}
76
77
```yaml
77
78
checks for orders:
78
79
- anomaly score for missing_count(id) < default:
@@ -121,14 +122,14 @@ Consider using the Soda Core Python library to set up a [programmatic scan]({% l
Copy file name to clipboardexpand all lines: soda-cl/automated-monitoring.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ parent: SodaCL
11
11
12
12
Use automated monitoring checks to instruct Soda to automatically check for row count anomalies and schema changes in a dataset.<br />
13
13
*Requires Soda Cloud*
14
-
14
+
{% include code-header.html %}
15
15
```yaml
16
16
automated monitoring:
17
17
datasets:
@@ -100,7 +100,7 @@ Need help? Ask the team in the <a href="https://community.soda.io/slack" target=
100
100
In the context of [SodaCL check types]({% link soda-cl/metrics-and-checks.md %}#check-types), automated monitoring checks are unique. This check employs the `anomaly score` and `schema` checks, but is limited in its syntax variation, with only a couple of mutable parts to specify which datasets to automatically apply the anomaly and schema checks.
101
101
102
102
The example check below uses a wildcard character (`%`) to specify that Soda Core executes automated monitoring checks against all datasets with names that begin with `prod`, and *not* to execute the checks against any dataset with a name that begins with `test`.
103
-
103
+
{% include code-header.html %}
104
104
```yaml
105
105
automated monitoring:
106
106
datasets:
@@ -111,7 +111,7 @@ automated monitoring:
111
111
<br />
112
112
113
113
You can also specify individual datasets to include or exclude, as in the following example.
114
-
114
+
{% include code-header.html %}
115
115
```yaml
116
116
automated monitoring:
117
117
datasets:
@@ -137,7 +137,7 @@ To review the checks results for automated monitoring checks in Soda Cloud, navi
Copy file name to clipboardexpand all lines: soda-cl/check-attributes.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ parent: SodaCL
9
9
*Last modified on {% last_modified_at %}*
10
10
11
11
As a Soda Cloud Admin user, you can define **check attributes** that your team can apply to checks when they write them in an agreement or in a checks YAML file for Soda Core.
12
-
12
+
{% include code-header.html %}
13
13
```yaml
14
14
checks for dim_product:
15
15
- missing_count(discount) < 10:
@@ -73,7 +73,7 @@ OR <br />
73
73
* writing or editing checks in a checks YAML file for Soda Core.
74
74
75
75
Apply attributes to checks using key:value pairs, as in the following example which applies five Soda Cloud-created attributes to a new `row_count` check.
76
-
76
+
{% include code-header.html %}
77
77
```yaml
78
78
checks for dim_product:
79
79
- row_count = 10:
@@ -106,7 +106,7 @@ Note that users must use the attribute's **NAME** as the attribute's key in a ch
106
106
## Optional check attribute SodaCL configurations
107
107
108
108
Using SodaCL, you can use variables to populate either the key or value of an existing attribute, as in the following example. Refer to [Configure variables in SodaCL]({% link soda-cl/filters.md %}#configure-variables-in-sodacl) for further details.
109
-
109
+
{% include code-header.html %}
110
110
```yaml
111
111
checks for dim_product:
112
112
- row_count = 10:
@@ -116,7 +116,7 @@ checks for dim_product:
116
116
```
117
117
118
118
You can use attributes in checks that Soda executes as part of a for each configuration, as in the following example. Refer to [Optional check configuration]({% link soda-cl/optional-config.md %}#apply-checks-to-multiple-datasets) for further details on for each.
Copy file name to clipboardexpand all lines: soda-cl/compare.md
+7-5
Original file line number
Diff line number
Diff line change
@@ -24,22 +24,22 @@ Have you got an idea or example of how to compare data that we haven't documente
24
24
25
25
Use a [cross check]({% link soda-cl/cross-row-checks.md %}) to conduct a row count comparison between datasets in the same data source. <br />
26
26
If you wish to compare datasets in different data sources, or datasets in the same data source but with different schemas, see [Compare data in different data sources or schemas](#compare-data-in-different-data-sources-or-schemas).
27
-
27
+
{% include code-header.html %}
28
28
```yaml
29
29
checks for dim_employee:
30
30
- row_count same as dim_department_group
31
31
```
32
32
33
33
Use a [reference check]({% link soda-cl/reference.md %}) to conduct a row-by-row comparison of values in two datasets _in the same data source_ and return a result that indicates the volume and samples of mismatched rows, as in the following example which ensures that the values in each of the two names columns are identical.<br />
34
34
If you wish to compare datasets in the same data source but with different _schemas_, see [Compare data in different data sources or schemas](#compare-data-in-different-data-sources-or-schemas).
35
-
35
+
{% include code-header.html %}
36
36
```yaml
37
37
checks for dim_customers_dev:
38
38
- values in (last_name, first_name) must exist in dim_customers_prod (last_name, first_name)
39
39
```
40
40
41
41
Alternatively, you can use a [failed rows check]({% link soda-cl/failed-rows-checks.md %}) to customize a SQL query that compares the values of datasets.
42
-
42
+
{% include code-header.html %}
43
43
```yaml
44
44
- failed rows:
45
45
name: Validate that the data is the same as retail customers
@@ -79,7 +79,7 @@ Alternatively, you can use a [failed rows check]({% link soda-cl/failed-rows-che
79
79
80
80
Use a [cross check]({% link soda-cl/cross-row-checks.md %}) to conduct a simple row count comparison of datasets in two different data sources, as in the following example that compares the row counts of two datasets in different data sources. <br />
81
81
Note that each data source involved in this check has been connected to data source either in the `configuration.yml` file with Soda Core, or in the **Add Data Source** workflow in Soda Cloud.
82
-
82
+
{% include code-header.html %}
83
83
```yaml
84
84
checks for dim_customer:
85
85
- row_count same as dim_customer in aws_postgres_retail
@@ -88,6 +88,7 @@ checks for dim_customer:
88
88
You can use a [reference check]({% link soda-cl/reference.md %}) to compare the values of different datasets in the _same_ data source (same data source, same schema), but if the datasets are in different schemas, as might happen when you have different environments like production, staging, development, etc., then Soda considers those datasets as _different data sources_. Where that is the case, you have a couple of options.
89
89
90
90
You can use a cross check to compare the row count of datasets in the same data source, but with different schemas. First, you must add dataset + schema as a separate data source connection in your `configuration.yml`, as in the following example that uses the same connection details but provides different schemas:
Then, you can define a cross check that compares values across these data sources.
112
+
{% include code-header.html %}
111
113
```yaml
112
114
checks for dim_customer:
113
115
# Check row count between datasets in different data sources
@@ -117,7 +119,7 @@ checks for dim_customer:
117
119
Alternatively, depending on the type of data source you are using, you can use a [failed rows check]({% link soda-cl/failed-rows-checks.md %}) to write a custom SQL query that compares contents of datasets that you define by adding the schema before the dataset name, such as `prod.retail_customers` and `staging.retail_customers`.
118
120
119
121
The following example accesses a single Snowflake data source and compares values between the same datasets but in different databases and schemas: `prod.staging.dmds_scores`and `prod.measurement.post_scores`.
Use a cross check to compare row counts between datasets within the same, or different, data sources.
14
14
15
15
See also: [Compare data using SodaCL]({% link soda-cl/compare.md %})
16
-
16
+
{% include code-header.html %}
17
17
```yaml
18
18
checks for dim_customer:
19
19
# Check row count between datasets in one data source
@@ -33,7 +33,7 @@ checks for dim_customer:
33
33
In the context of [SodaCL check types]({% link soda-cl/metrics-and-checks.md %}#check-types), cross checks are unique. This check employs the `row_count` metric and is limited in its syntax variation, with only a few mutable parts to specify dataset and data source names.
34
34
35
35
The example check below compares the volume of rows in two datasets in the same data source. If the row count in the `dim_department_group` is not the same as in `dim_customer`, the check fails.
36
-
36
+
{% include code-header.html %}
37
37
```yaml
38
38
checks for dim_customer:
39
39
- row_count same as dim_department_group
@@ -44,7 +44,7 @@ checks for dim_customer:
44
44
You can use cross checks to compare row counts between datasets in different data sources, as in the example below.
45
45
46
46
In the example, `retail_customers` is the name of the other dataset, and `aws_postgres_retail` is the name of the data source in which `retail_customers` exists.
47
-
47
+
{% include code-header.html %}
48
48
```yaml
49
49
checks for dim_customer:
50
50
- row_count same as retail_customers in aws_postgres_retail
@@ -67,15 +67,15 @@ checks for dim_customer:
67
67
| | Apply a dataset filter to partition data during a scan; see [example](#example-with-dataset-filter). | - |
68
68
69
69
#### Example with check name
70
-
70
+
{% include code-header.html %}
71
71
```yaml
72
72
checks for dim_customer:
73
73
- row_count same as retail_customers in aws_postgres_retail:
0 commit comments