Skip to content

Commit a19fb3d

Browse files
authored
Changes for SSL cert errors, SF proxy params, data contracts, SSO SP-initiated, reference check error (#804)
* Troubleshoot SSL certificate error. * Reference and Snowflake connection troubleshooting * Clarified scan def names in programmatic scans * Data contract clarifications and adjustments
1 parent cdbd5dc commit a19fb3d

16 files changed

+139
-32
lines changed

_data/nav.yml

+2
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,8 @@
226226
page: soda/connect-trino.md
227227
- subtitle: Connect to Vertica
228228
page: soda/connect-vertica.md
229+
- subtitle: Troubleshoot connections
230+
page: soda/connect-troubleshoot.md
229231

230232
- title: Soda Cloud API
231233
page: api-docs/public-cloud-api-v1.md

_includes/custom-sampler.md

+1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
* You can save Soda Library scan results anywhere in your system; the `scan_result` object contains all the scan result information. To import Soda Library in Python so you can utilize the `Scan()` object, [install a Soda Library package]({% link soda-library/programmatic.md %}), then use `from soda.scan import Scan`.
2+
* If you provide a name for the scan definition to identify inline checks in a programmatic scan as independent of other inline checks in a different programmatic scan or pipeline, be sure to set a unique scan definition name for each programmatic scan. Using the same scan definition name in multiple programmatic scans results in confused check results in Soda Cloud.
23
* If you wish to collect samples of failed rows when a check fails, you can employ a custom sampler; see [Configure a failed row sampler]({% link soda-cl/failed-rows-checks.md %}#configure-a-failed-row-sampler).
34
* Be sure to include any variables in your programmatic scan *before* the check YAML files. Soda requires the variable input for any variables defined in the check YAML files.

_includes/snowflake-proxy.md

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
**Problem:** While attempting to connect Soda to a Snowflake data source using proxy parameters, you encounter an error that reads something similar to `Could not connect to data source "name_db": 250001 (08001): Failed to connect to DB: mydb.eu-west-1.snowflakecomputing.com:443. Incoming request with IP/Token xx.xxx.xx.xxx is not allowed to access Snowflake.`
2+
3+
```yaml
4+
data_source: my_data_source
5+
type: snowflake
6+
...
7+
session_param:
8+
QUERY_TAG: soda-test
9+
QUOTED_IDENTIFIERS_IGNORE_CASE: false
10+
proxy_http: http://a-proxy-o-dd-dddd-net:8000
11+
proxy_https: https://a-proxy-o-dd-dddd-net:8000
12+
```
13+
14+
**Solution:** When connecting to a Snowflake data source by proxyy, be sure to set the new proxy environment variables from the command-line using export statements, as in the following example.
15+
```shell
16+
export HTTP_PROXY=http://a-proxy-o-dd-dddd-net:8000
17+
export HTTPS_PROXY=https://a-proxy-o-dd-dddd-net:8000
18+
```

assets/images/experimental.png

475 Bytes
Loading

soda-cl/reference.md

+5-2
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ To review the failed rows in Soda Cloud, navigate to the **Checks** dashboard, t
105105
| ✓ | Use quotes when identifying dataset or column names; see [example](#example-with-quotes). <br />Note that the type of quotes you use must match that which your data source uses. For example, BigQuery uses a backtick ({% raw %}`{% endraw %}) as a quotation mark. | [Use quotes in a check]({% link soda-cl/optional-config.md %}#use-quotes-in-a-check) |
106106
| | Use wildcard characters ({% raw %} % {% endraw %} or {% raw %} * {% endraw %}) in values in the check. | - |
107107
| | Use for each to apply reference checks to multiple datasets in one scan. | - |
108-
| ✓ | Apply a dataset filter to partition data during a scan; see [example](#example-with-dataset-filter). | [Scan a portion of your dataset]({% link soda-cl/optional-config.md %}#scan-a-portion-of-your-dataset) |
108+
| ✓ | Apply a dataset filter to partition data during a scan; see [example](#example-with-dataset-filter). If you encounter difficulties, see [Filter not passed with reference check]({% link soda-cl/troubleshoot.md %}#filter-not-passed-with-reference-check). | [Scan a portion of your dataset]({% link soda-cl/optional-config.md %}#scan-a-portion-of-your-dataset) |
109109

110110
#### Example with check name
111111
{% include code-header.html %}
@@ -123,6 +123,9 @@ checks for dim_department_group:
123123
```
124124

125125
#### Example with dataset filter
126+
127+
Refer to [Troubleshoot SodaCL]({% link soda-cl/troubleshoot.md %}#filter-not-passed-with-reference-check) to address challenges specific to reference checks with dataset filters.
128+
126129
{% include code-header.html %}
127130
```yaml
128131
filter customers_c8d90f60 [daily]:
@@ -132,7 +135,7 @@ checks for customers_c8d90f60 [daily]:
132135
- values in (cat) must exist in customers_europe (cat2)
133136
```
134137

135-
Refer to [Troubleshoot SodaCL]({% link soda-cl/troubleshoot.md %}#filter-not-passed-with-reference-check) to address challenges specific to reference checks with dataset filters.
138+
136139

137140
<br />
138141

soda-cloud/sso.md

+5-3
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ When an organization's IT Admin revokes a user's access to Soda Cloud through th
3434

3535
Once your organization enables SSO for all Soda Cloud users, Soda Cloud blocks all non-SSO login attempts and password changes via <a href="https://cloud.soda.io/login" target="_blank">cloud.soda.io/login<a/>. If an employee attempts a non-SSO login or attempts to change a password using "Forgot password?" on <a href="https://cloud.soda.io/login" target="_blank">cloud.soda.io/login<a/>, Soda Cloud presents a message that explains that they must log in or change their password using their SSO provider.
3636

37+
Soda Cloud supports both Identity Provider Initiated (IdP-initiated), and Service Provider Initiated (SP-initiated) single sign-on integrations. Be sure to incidate which type of SSO your organization uses when setting it up with the Soda Support team.
38+
3739

3840
## Add Soda Cloud to Azure AD
3941

@@ -63,7 +65,7 @@ Once your organization enables SSO for all Soda Cloud users, Soda Cloud blocks a
6365
* **Azure AD Identifier** (Section 4 in Azure). This is the IdP entity, ID, or Identity Provider Issuer that Soda needs
6466
* **Login URL** (Section 4 in Azure). This is the IdP SSO service URL, or Identity Provider Single Sign-On URL that Soda needs.
6567
* **X.509 Certificate**. Click the **Download** link next to **Certificate (Base64)**.
66-
12. Email the copied and downloaded values to <a href="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion.
68+
12. Email the copied and downloaded values to <a href="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion. Soda Cloud supports both Identity Provider Initiated (IdP-initiated), and Service Provider Initiated (SP-initiated) single sign-on integrations; be sure to incidate which type of SSO your organization uses.
6769
13. Test the integration by assigning the Soda application in Azure AD to a single user, then requesting that they log in.
6870
14. After a successful single-user test of the sign in, assign access to the Soda Azure AD app to users and/or user groups in your organization.
6971

@@ -89,7 +91,7 @@ The values for these fields are unique to your organization and are provided to
8991
* **Identity Provider Single Sign-On URL**
9092
* **Identity Provider Issuer**
9193
* **X.509 Certificate**
92-
11. Email the copied and downloaded values to <a href="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion.
94+
11. Email the copied and downloaded values to <a href="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion. Soda Cloud supports both Identity Provider Initiated (IdP-initiated), and Service Provider Initiated (SP-initiated) single sign-on integrations; be sure to incidate which type of SSO your organization uses.
9395
12. Test the integration by assigning the Soda application in Okta to a single user, then requesting that they log in.
9496
13. After a successful single-user test of the sign in, assign access to the Soda Okta app to users and/or user groups in your organization.
9597

@@ -107,7 +109,7 @@ The values for these fields are unique to your organization and are provided to
107109
5. On the **SAML Attribute mapping** page, add two Google directory attributes and map as follows:
108110
* Last Name → User.FamilyName
109111
* First Name → User.GivenName
110-
6. Email the copied and downloaded values to <a href="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion.
112+
6. Email the copied and downloaded values to <a href="mailto:[email protected]">[email protected]</a>. With those values, Soda completes the SSO configuration for your organization in cloud.soda.io and notifies you of completion. Soda Cloud supports both Identity Provider Initiated (IdP-initiated), and Service Provider Initiated (SP-initiated) single sign-on integrations; be sure to incidate which type of SSO your organization uses.
111113
7. In the Google Workspace admin portal, use Google's instructions to <a href="https://support.google.com/a/answer/6087519?hl=en&ref_topic=7559288" target="_blank">Turn on your SAML app</a> and verify that SSO works with the new custom app for Soda.
112114

113115

soda-library/programmatic.md

+2
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,8 @@ scan.execute()
140140
scan.set_verbose(True)
141141

142142
# Set scan definition name, equivalent to CLI -s option
143+
# The scan definition name MUST be unique to this scan, and
144+
# not duplicated in any other programmatic scan
143145
##################
144146
scan.set_scan_definition_name("YOUR_SCHEDULE_NAME")
145147

soda-library/run-a-scan.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -282,9 +282,8 @@ Because Soda Library pushes scan results to Soda Cloud, you may not want to chan
282282

283283
**Problem:** In a Windows environment, you see an error that reads `[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (ssl_c:997)`.
284284

285-
**Short-term solution:** Use `pip install pip-system-certs` to temporarily resolve the issue. This install works to resolve the issue only on Windows machines where the Ops team installs all the certificates needed through Group Policy Objects, or similar. However, the fix is short-term because when you try to run this in a pipeline on another machine, the error will reappear.
285+
**Solution:** Use `pip install pip-system-certs` to potentially resolve the issue. This install works to resolve the issue only on Windows machines where the Ops team installs all the certificates needed through Group Policy Objects, or similar.
286286

287-
**Short-term solution:** Contact your Operations or System Admin team to obtain the proxy certificate.
288287

289288
</div>
290289
<div class="panel" id="three-panel" markdown="1">
@@ -359,6 +358,8 @@ scan.execute()
359358
scan.set_verbose(True)
360359
361360
# Set scan definition name, equivalent to CLI -s option
361+
# The scan definition name MUST be unique to this scan, and
362+
# not duplicated in any other programmatic scan
362363
##################
363364
scan.set_scan_definition_name("YOUR_SCHEDULE_NAME")
364365

soda/connect-snowflake.md

+4
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,10 @@ checks for VOLUME:
171171
name: Trader row count
172172
```
173173

174+
<br />
175+
176+
{% include snowflake-proxy.md %}
177+
174178
<br />
175179
<br />
176180

soda/connect-troubleshoot.md

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
layout: default
3+
title: Troubleshoot data source connections
4+
description:
5+
parent:
6+
---
7+
8+
# Troubleshoot data source connections
9+
Last modified on {% last_modified_at %}
10+
11+
12+
13+
## SSL certificate error
14+
15+
**Problem:** You encounter an SSL certificate error while attempting to connect Soda to a data source.
16+
17+
**Solution:** Use `pip install pip-system-certs` to potentially resolve the issue. This install works to resolve the issue only on Windows machines where the Ops team installs all the certificates needed through Group Policy Objects, or similar.
18+
19+
## Snowflake proxy connection error
20+
21+
{% include snowflake-proxy.md %}
22+
23+
## Go further
24+
25+
* Access [Troubleshoot SodaCL]({% link soda-cl/troubleshoot.md %}) for help resolving issues running scans with SodaCL.
26+
* Need help? Join the <a href="https://community.soda.io/slack" target="_blank"> Soda community on Slack</a>.
27+
28+
<br />
29+
30+
---
31+
32+
Was this documentation helpful?
33+
34+
<!-- LikeBtn.com BEGIN -->
35+
<span class="likebtn-wrapper" data-theme="tick" data-i18n_like="Yes" data-ef_voting="grow" data-show_dislike_label="true" data-counter_zero_show="true" data-i18n_dislike="No"></span>
36+
<script>(function(d,e,s){if(d.getElementById("likebtn_wjs"))return;a=d.createElement(e);m=d.getElementsByTagName(e)[0];a.async=1;a.id="likebtn_wjs";a.src=s;m.parentNode.insertBefore(a, m)})(document,"script","//w.likebtn.com/js/w/widget.js");</script>
37+
<!-- LikeBtn.com END -->
38+
39+
{% include docs-footer.md %}

soda/data-contracts-checks.md

+31-3
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@ description: Soda data contract checks enable you to verify data quality early i
55
parent: Create a data contract
66
---
77

8-
# Data contract check reference
9-
<br />![experimental](/assets/images/experimental.png){:height="150px" width="150px"} <br />
8+
# Data contract check reference <br />
9+
![experimental](/assets/images/experimental.png){:height="300px" width="300px"} <br />
1010
*Last modified on {% last_modified_at %}*
1111

1212
Soda data contracts is a Python library that verifies data quality standards as early and often as possible in a data pipeline so as to prevent negative downstream impact. Learn more [About Soda data contracts]({% link soda/data-contracts.md %}#about-data-contracts).
1313

1414
<small>✖️ &nbsp;&nbsp; Requires Soda Core Scientific</small><br />
15-
<small>✔️ &nbsp;&nbsp; Supported in Soda Core 3.3.3 or greater</small><br />
15+
<small>✔️ &nbsp;&nbsp; Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Spark, and Snowflake</small><br />
1616
<small>✖️ &nbsp;&nbsp; Supported in Soda Library + Soda Cloud</small><br />
1717
<small>✖️ &nbsp;&nbsp; Supported in Soda Cloud Agreements + Soda Agent</small><br />
1818
<small>✖️ &nbsp;&nbsp; Supported by SodaGPT</small><br />
@@ -46,6 +46,8 @@ Note that data contracts checks do not follow SodaCL syntax.
4646
```yaml
4747
dataset: dim_employee
4848

49+
...
50+
4951
columns:
5052
- name: id
5153
checks:
@@ -78,6 +80,8 @@ This check compares the maximum value in the column to the time the scan runs; t
7880
```yaml
7981
dataset: dim_customer
8082
83+
...
84+
8185
columns:
8286
- name: date_first_purchase
8387
checks:
@@ -101,6 +105,8 @@ See also: [Combine missing and validity](#combine-missing-and-validity)
101105
```yaml
102106
dataset: dim_customer
103107
108+
...
109+
104110
columns:
105111
- name: title
106112
checks:
@@ -136,6 +142,8 @@ columns:
136142
```yaml
137143
dataset: dim_customer
138144
145+
...
146+
139147
columns:
140148
- name: first_name
141149
checks:
@@ -158,6 +166,8 @@ checks:
158166
```yaml
159167
dataset: dim_customer
160168
169+
...
170+
161171
columns:
162172
- name: yearly_income
163173
checks:
@@ -190,6 +200,8 @@ Relative to a [SQL metric query](#sql-metric-query) check, a SQL metric expressi
190200
{% include code-header.html %}
191201
```yaml
192202
dataset: CUSTOMERS
203+
...
204+
193205
columns:
194206
- name: id
195207
# SQL metric expression check for a column
@@ -205,6 +217,8 @@ columns:
205217
{% include code-header.html %}
206218
```yaml
207219
dataset: CUSTOMERS
220+
...
221+
208222
columns:
209223
- name: id
210224
- name: country
@@ -231,6 +245,8 @@ You can apply a SQL metric check to one or more columns or to an entire dataset.
231245
{% include code-header.html %}
232246
```yaml
233247
dataset: CUSTOMERS
248+
...
249+
234250
columns:
235251
# SQL metric query check for a column
236252
- name: id
@@ -248,6 +264,8 @@ columns:
248264
{% include code-header.html %}
249265
```yaml
250266
dataset: CUSTOMERS
267+
...
268+
251269
columns:
252270
- name: id
253271
checks:
@@ -274,6 +292,8 @@ checks:
274292
```yaml
275293
dataset: dim_customer
276294
295+
...
296+
277297
columns:
278298
- name: first_name
279299
data_type: character varying
@@ -324,6 +344,8 @@ The referential dataset must exist in the same warehouse as the dataset identifi
324344
```yaml
325345
dataset: dim_employee
326346
347+
...
348+
327349
columns:
328350
- name: country
329351
checks:
@@ -343,6 +365,8 @@ You can combine column configuration keys to include both missing and validity p
343365
```yaml
344366
dataset: dim_product
345367
368+
...
369+
346370
columns:
347371
- name: size
348372
checks:
@@ -360,6 +384,8 @@ In the example below, Soda considers any row that failed the `no_missing_values`
360384
```yaml
361385
dataset: dim_product
362386
387+
...
388+
363389
columns:
364390
- name: size
365391
checks:
@@ -386,6 +412,8 @@ The example below verifies that the only valid value for the column `currency` i
386412
```yaml
387413
dataset: dim_product
388414
415+
...
416+
389417
columns:
390418
- name: country
391419
- name: currency

soda/data-contracts-verify.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,16 @@ description: Use a Python API to verify data contract checks programmatically wi
55
parent: Create a data contract
66
---
77

8-
# Verify a data contract
9-
<br />![experimental](/assets/images/experimental.png){:height="150px" width="150px"} <br />
8+
# Verify a data contract <br />
9+
![experimental](/assets/images/experimental.png){:height="300px" width="300px"} <br />
1010
*Last modified on {% last_modified_at %}*
1111

1212
To verify a **Soda data contract** is to scan the data in a warehouse to execute the data contract checks you defined in a contracts YAML file. Available as a Python library, you run the scan programmatically, invoking Soda data contracts in a CI/CD workflow when you create a new pull request, or in a data pipeline after importing or transforming new data.
1313

1414
When deciding when to verify a data contract, consider that contract verification works best on new data as soon as it is produced so as to limit its exposure to other systems or users who might access it. The earlier in a pipeline or workflow, the better! Further, best practice suggests that you store batches of new data in a temporary table, verify a contract on the batches, then append the data to a larger table.
1515

1616
<small>✖️ &nbsp;&nbsp; Requires Soda Core Scientific</small><br />
17-
<small>✔️ &nbsp;&nbsp; Supported in Soda Core 3.3.3 or greater</small><br />
17+
<small>✔️ &nbsp;&nbsp; Experimentally supported in Soda Core 3.3.3 or greater for PostgreSQL, Spark, and Snowflake</small><br />
1818
<small>✖️ &nbsp;&nbsp; Supported in Soda Library + Soda Cloud</small><br />
1919
<small>✖️ &nbsp;&nbsp; Supported in Soda Cloud Agreements + Soda Agent</small><br />
2020
<small>✖️ &nbsp;&nbsp; Supported by SodaGPT</small><br />
@@ -63,7 +63,7 @@ When deciding when to verify a data contract, consider that contract verificatio
6363
.execute()
6464
)
6565
66-
logging.debug(str(contract_verification_result))
66+
print(str(contract_verification_result))
6767
```
6868
4. At runtime, Soda connects with your warehouse and verifies the contract by executing the data contract checks in your file. Use `${SCHEMA}` syntax to provide any environment variable values in a contract YAML file. Soda returns results of the verification as pass or fail check results, or indicate errors if any exist; see below.
6969

@@ -117,7 +117,7 @@ contract_verification: ContractVerification = (
117117
)
118118
119119
if contract_verification.logs.has_errors():
120-
logging.error(f"The contract has syntax or semantic errors: \n{contract_verification.logs}")
120+
print(f"The contract has syntax or semantic errors: \n{contract_verification.logs}")
121121
```
122122

123123
## Add a check identity

0 commit comments

Comments
 (0)