You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* You can save Soda Library scan results anywhere in your system; the `scan_result` object contains all the scan result information. To import Soda Library in Python so you can utilize the `Scan()` object, [install a Soda Library package]({% link soda-library/programmatic.md %}), then use `from soda.scan import Scan`.
2
+
* If you provide a name for the scan definition to identify inline checks in a programmatic scan as independent of other inline checks in a different programmatic scan or pipeline, be sure to set a unique scan definition name for each programmatic scan. Using the same scan definition name in multiple programmatic scans results in confused check results in Soda Cloud.
2
3
* If you wish to collect samples of failed rows when a check fails, you can employ a custom sampler; see [Configure a failed row sampler]({% link soda-cl/failed-rows-checks.md %}#configure-a-failed-row-sampler).
3
4
* Be sure to include any variables in your programmatic scan *before* the check YAML files. Soda requires the variable input for any variables defined in the check YAML files.
**Problem:** While attempting to connect Soda to a Snowflake data source using proxy parameters, you encounter an error that reads something similar to `Could not connect to data source "name_db": 250001 (08001): Failed to connect to DB: mydb.eu-west-1.snowflakecomputing.com:443. Incoming request with IP/Token xx.xxx.xx.xxx is not allowed to access Snowflake.`
2
+
3
+
```yaml
4
+
data_source: my_data_source
5
+
type: snowflake
6
+
...
7
+
session_param:
8
+
QUERY_TAG: soda-test
9
+
QUOTED_IDENTIFIERS_IGNORE_CASE: false
10
+
proxy_http: http://a-proxy-o-dd-dddd-net:8000
11
+
proxy_https: https://a-proxy-o-dd-dddd-net:8000
12
+
```
13
+
14
+
**Solution:** When connecting to a Snowflake data source by proxyy, be sure to set the new proxy environment variables from the command-line using export statements, as in the following example.
Copy file name to clipboardexpand all lines: soda-cl/reference.md
+5-2
Original file line number
Diff line number
Diff line change
@@ -105,7 +105,7 @@ To review the failed rows in Soda Cloud, navigate to the **Checks** dashboard, t
105
105
| ✓ | Use quotes when identifying dataset or column names; see [example](#example-with-quotes). <br />Note that the type of quotes you use must match that which your data source uses. For example, BigQuery uses a backtick ({% raw %}`{% endraw %}) as a quotation mark. | [Use quotes in a check]({% link soda-cl/optional-config.md %}#use-quotes-in-a-check) |
106
106
| | Use wildcard characters ({% raw %} % {% endraw %} or {% raw %} * {% endraw %}) in values in the check. | - |
107
107
| | Use for each to apply reference checks to multiple datasets in one scan. | - |
108
-
| ✓ | Apply a dataset filter to partition data during a scan; see [example](#example-with-dataset-filter). | [Scan a portion of your dataset]({% link soda-cl/optional-config.md %}#scan-a-portion-of-your-dataset) |
108
+
| ✓ | Apply a dataset filter to partition data during a scan; see [example](#example-with-dataset-filter). If you encounter difficulties, see [Filter not passed with reference check]({% link soda-cl/troubleshoot.md %}#filter-not-passed-with-reference-check). | [Scan a portion of your dataset]({% link soda-cl/optional-config.md %}#scan-a-portion-of-your-dataset) |
109
109
110
110
#### Example with check name
111
111
{% include code-header.html %}
@@ -123,6 +123,9 @@ checks for dim_department_group:
123
123
```
124
124
125
125
#### Example with dataset filter
126
+
127
+
Refer to [Troubleshoot SodaCL]({% link soda-cl/troubleshoot.md %}#filter-not-passed-with-reference-check) to address challenges specific to reference checks with dataset filters.
128
+
126
129
{% include code-header.html %}
127
130
```yaml
128
131
filter customers_c8d90f60 [daily]:
@@ -132,7 +135,7 @@ checks for customers_c8d90f60 [daily]:
132
135
- values in (cat) must exist in customers_europe (cat2)
133
136
```
134
137
135
-
Refer to [Troubleshoot SodaCL]({% link soda-cl/troubleshoot.md %}#filter-not-passed-with-reference-check) to address challenges specific to reference checks with dataset filters.
Copy file name to clipboardexpand all lines: soda-cloud/sso.md
+5-3
Original file line number
Diff line number
Diff line change
@@ -34,6 +34,8 @@ When an organization's IT Admin revokes a user's access to Soda Cloud through th
34
34
35
35
Once your organization enables SSO for all Soda Cloud users, Soda Cloud blocks all non-SSO login attempts and password changes via <ahref="https://cloud.soda.io/login"target="_blank">cloud.soda.io/login<a/>. If an employee attempts a non-SSO login or attempts to change a password using "Forgot password?" on <ahref="https://cloud.soda.io/login"target="_blank">cloud.soda.io/login<a/>, Soda Cloud presents a message that explains that they must log in or change their password using their SSO provider.
36
36
37
+
Soda Cloud supports both Identity Provider Initiated (IdP-initiated), and Service Provider Initiated (SP-initiated) single sign-on integrations. Be sure to incidate which type of SSO your organization uses when setting it up with the Soda Support team.
38
+
37
39
38
40
## Add Soda Cloud to Azure AD
39
41
@@ -63,7 +65,7 @@ Once your organization enables SSO for all Soda Cloud users, Soda Cloud blocks a
63
65
***Azure AD Identifier** (Section 4 in Azure). This is the IdP entity, ID, or Identity Provider Issuer that Soda needs
64
66
***Login URL** (Section 4 in Azure). This is the IdP SSO service URL, or Identity Provider Single Sign-On URL that Soda needs.
65
67
***X.509 Certificate**. Click the **Download** link next to **Certificate (Base64)**.
66
-
12. Email the copied and downloaded values to <ahref="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion.
68
+
12. Email the copied and downloaded values to <ahref="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion. Soda Cloud supports both Identity Provider Initiated (IdP-initiated), and Service Provider Initiated (SP-initiated) single sign-on integrations; be sure to incidate which type of SSO your organization uses.
67
69
13. Test the integration by assigning the Soda application in Azure AD to a single user, then requesting that they log in.
68
70
14. After a successful single-user test of the sign in, assign access to the Soda Azure AD app to users and/or user groups in your organization.
69
71
@@ -89,7 +91,7 @@ The values for these fields are unique to your organization and are provided to
89
91
***Identity Provider Single Sign-On URL**
90
92
***Identity Provider Issuer**
91
93
***X.509 Certificate**
92
-
11. Email the copied and downloaded values to <ahref="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion.
94
+
11. Email the copied and downloaded values to <ahref="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion. Soda Cloud supports both Identity Provider Initiated (IdP-initiated), and Service Provider Initiated (SP-initiated) single sign-on integrations; be sure to incidate which type of SSO your organization uses.
93
95
12. Test the integration by assigning the Soda application in Okta to a single user, then requesting that they log in.
94
96
13. After a successful single-user test of the sign in, assign access to the Soda Okta app to users and/or user groups in your organization.
95
97
@@ -107,7 +109,7 @@ The values for these fields are unique to your organization and are provided to
107
109
5. On the **SAML Attribute mapping** page, add two Google directory attributes and map as follows:
108
110
* Last Name → User.FamilyName
109
111
* First Name → User.GivenName
110
-
6. Email the copied and downloaded values to <ahref="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion.
112
+
6. Email the copied and downloaded values to <ahref="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion. Soda Cloud supports both Identity Provider Initiated (IdP-initiated), and Service Provider Initiated (SP-initiated) single sign-on integrations; be sure to incidate which type of SSO your organization uses.
111
113
7. In the Google Workspace admin portal, use Google's instructions to <ahref="https://support.google.com/a/answer/6087519?hl=en&ref_topic=7559288"target="_blank">Turn on your SAML app</a> and verify that SSO works with the new custom app for Soda.
Copy file name to clipboardexpand all lines: soda-library/run-a-scan.md
+3-2
Original file line number
Diff line number
Diff line change
@@ -282,9 +282,8 @@ Because Soda Library pushes scan results to Soda Cloud, you may not want to chan
282
282
283
283
**Problem:** In a Windows environment, you see an error that reads `[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (ssl_c:997)`.
284
284
285
-
**Short-term solution:** Use `pip install pip-system-certs` to temporarily resolve the issue. This install works to resolve the issue only on Windows machines where the Ops team installs all the certificates needed through Group Policy Objects, or similar. However, the fix is short-term because when you try to run this in a pipeline on another machine, the error will reappear.
285
+
**Solution:** Use `pip install pip-system-certs` to potentially resolve the issue. This install works to resolve the issue only on Windows machines where the Ops team installs all the certificates needed through Group Policy Objects, or similar.
286
286
287
-
**Short-term solution:** Contact your Operations or System Admin team to obtain the proxy certificate.
288
287
289
288
</div>
290
289
<div class="panel" id="three-panel" markdown="1">
@@ -359,6 +358,8 @@ scan.execute()
359
358
scan.set_verbose(True)
360
359
361
360
# Set scan definition name, equivalent to CLI -s option
361
+
# The scan definition name MUST be unique to this scan, and
**Problem:** You encounter an SSL certificate error while attempting to connect Soda to a data source.
16
+
17
+
**Solution:** Use `pip install pip-system-certs` to potentially resolve the issue. This install works to resolve the issue only on Windows machines where the Ops team installs all the certificates needed through Group Policy Objects, or similar.
18
+
19
+
## Snowflake proxy connection error
20
+
21
+
{% include snowflake-proxy.md %}
22
+
23
+
## Go further
24
+
25
+
* Access [Troubleshoot SodaCL]({% link soda-cl/troubleshoot.md %}) for help resolving issues running scans with SodaCL.
26
+
* Need help? Join the <ahref="https://community.soda.io/slack"target="_blank"> Soda community on Slack</a>.
Soda data contracts is a Python library that verifies data quality standards as early and often as possible in a data pipeline so as to prevent negative downstream impact. Learn more [About Soda data contracts]({% link soda/data-contracts.md %}#about-data-contracts).
13
13
14
14
<small>✖️ Requires Soda Core Scientific</small><br />
15
-
<small>✔️ Supported in Soda Core 3.3.3 or greater</small><br />
15
+
<small>✔️ Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Spark, and Snowflake</small><br />
16
16
<small>✖️ Supported in Soda Library + Soda Cloud</small><br />
17
17
<small>✖️ Supported in Soda Cloud Agreements + Soda Agent</small><br />
18
18
<small>✖️ Supported by SodaGPT</small><br />
@@ -46,6 +46,8 @@ Note that data contracts checks do not follow SodaCL syntax.
46
46
```yaml
47
47
dataset: dim_employee
48
48
49
+
...
50
+
49
51
columns:
50
52
- name: id
51
53
checks:
@@ -78,6 +80,8 @@ This check compares the maximum value in the column to the time the scan runs; t
78
80
```yaml
79
81
dataset: dim_customer
80
82
83
+
...
84
+
81
85
columns:
82
86
- name: date_first_purchase
83
87
checks:
@@ -101,6 +105,8 @@ See also: [Combine missing and validity](#combine-missing-and-validity)
101
105
```yaml
102
106
dataset: dim_customer
103
107
108
+
...
109
+
104
110
columns:
105
111
- name: title
106
112
checks:
@@ -136,6 +142,8 @@ columns:
136
142
```yaml
137
143
dataset: dim_customer
138
144
145
+
...
146
+
139
147
columns:
140
148
- name: first_name
141
149
checks:
@@ -158,6 +166,8 @@ checks:
158
166
```yaml
159
167
dataset: dim_customer
160
168
169
+
...
170
+
161
171
columns:
162
172
- name: yearly_income
163
173
checks:
@@ -190,6 +200,8 @@ Relative to a [SQL metric query](#sql-metric-query) check, a SQL metric expressi
190
200
{% include code-header.html %}
191
201
```yaml
192
202
dataset: CUSTOMERS
203
+
...
204
+
193
205
columns:
194
206
- name: id
195
207
# SQL metric expression check for a column
@@ -205,6 +217,8 @@ columns:
205
217
{% include code-header.html %}
206
218
```yaml
207
219
dataset: CUSTOMERS
220
+
...
221
+
208
222
columns:
209
223
- name: id
210
224
- name: country
@@ -231,6 +245,8 @@ You can apply a SQL metric check to one or more columns or to an entire dataset.
231
245
{% include code-header.html %}
232
246
```yaml
233
247
dataset: CUSTOMERS
248
+
...
249
+
234
250
columns:
235
251
# SQL metric query check for a column
236
252
- name: id
@@ -248,6 +264,8 @@ columns:
248
264
{% include code-header.html %}
249
265
```yaml
250
266
dataset: CUSTOMERS
267
+
...
268
+
251
269
columns:
252
270
- name: id
253
271
checks:
@@ -274,6 +292,8 @@ checks:
274
292
```yaml
275
293
dataset: dim_customer
276
294
295
+
...
296
+
277
297
columns:
278
298
- name: first_name
279
299
data_type: character varying
@@ -324,6 +344,8 @@ The referential dataset must exist in the same warehouse as the dataset identifi
324
344
```yaml
325
345
dataset: dim_employee
326
346
347
+
...
348
+
327
349
columns:
328
350
- name: country
329
351
checks:
@@ -343,6 +365,8 @@ You can combine column configuration keys to include both missing and validity p
343
365
```yaml
344
366
dataset: dim_product
345
367
368
+
...
369
+
346
370
columns:
347
371
- name: size
348
372
checks:
@@ -360,6 +384,8 @@ In the example below, Soda considers any row that failed the `no_missing_values`
360
384
```yaml
361
385
dataset: dim_product
362
386
387
+
...
388
+
363
389
columns:
364
390
- name: size
365
391
checks:
@@ -386,6 +412,8 @@ The example below verifies that the only valid value for the column `currency` i
To verify a **Soda data contract** is to scan the data in a warehouse to execute the data contract checks you defined in a contracts YAML file. Available as a Python library, you run the scan programmatically, invoking Soda data contracts in a CI/CD workflow when you create a new pull request, or in a data pipeline after importing or transforming new data.
13
13
14
14
When deciding when to verify a data contract, consider that contract verification works best on new data as soon as it is produced so as to limit its exposure to other systems or users who might access it. The earlier in a pipeline or workflow, the better! Further, best practice suggests that you store batches of new data in a temporary table, verify a contract on the batches, then append the data to a larger table.
15
15
16
16
<small>✖️ Requires Soda Core Scientific</small><br />
17
-
<small>✔️ Supported in Soda Core 3.3.3 or greater</small><br />
17
+
<small>✔️ Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Spark, and Snowflake</small><br />
18
18
<small>✖️ Supported in Soda Library + Soda Cloud</small><br />
19
19
<small>✖️ Supported in Soda Cloud Agreements + Soda Agent</small><br />
20
20
<small>✖️ Supported by SodaGPT</small><br />
@@ -63,7 +63,7 @@ When deciding when to verify a data contract, consider that contract verificatio
63
63
.execute()
64
64
)
65
65
66
-
logging.debug(str(contract_verification_result))
66
+
print(str(contract_verification_result))
67
67
```
68
68
4. At runtime, Soda connects with your warehouse and verifies the contract by executing the data contract checks in your file. Use `${SCHEMA}` syntax to provide any environment variable values in a contract YAML file. Soda returns results of the verification as pass or fail check results, or indicate errors if any exist; see below.
0 commit comments