Skip to content

Commit 58a3e90

Browse files
authored
Document soda-managed agent (#702)
* Document soda-managed agent * adjusted toc * Remaining updates and changes to include soda-hosted agent * Update image * adjusted typo * correct link * Refined security language * Added details for role parameter in Snowflake connections * Adjusted details, added command to see logs * Add snowflake troubleshooting * Added agent migration instructions * added release notes * Clarified migration guide
1 parent 0f8f737 commit 58a3e90

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+505
-120
lines changed

_data/nav.yml

+2
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
page: soda/setup-guide.md
1414
- subtitle: Install Soda Library
1515
page: soda-library/install.md
16+
- subtitle: Set up a Soda-hosted agent
17+
page: soda-agent/managed-agent.md
1618
- subtitle: Deploy a Soda Agent
1719
page: soda-agent/deploy.md
1820
- subtitle: Soda Agent extras

_includes/about-soda.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Soda works by taking the data quality checks that you prepare and using them to
1414

1515
To test your data quality, you choose a flavor of Soda (choose a deployment model) which enables you to configure connections with your data sources and define data quality checks, then run scans that execute your data quality checks.
1616

17-
* **Connect to your data source.** <br />Connect Soda to a data source such as Snowflake, Amazon Athena, or Big Query by providing access details for your data source such as host, port, and data source login credentials.
17+
* **Connect to your data source.** <br />Connect Soda to a data source such as Snowflake, Amazon Athena, or BigQuery by providing access details for your data source such as host, port, and data source login credentials.
1818
* **Define checks to surface bad-quality data.** <br />Define data quality checks using Soda Checks Language (SodaCL), a domain-specific language for data quality testing. A Soda Check is a test that Soda performs when it scans a dataset in your data source.
1919
* **Run a scan to execute your data quality checks.** <br />During a scan, Soda does not ingest your data, it only scans it for quality metrics, then uses the metadata to prepare scan results<sup>1</sup>. After a scan, each check results in one of three default states:
2020
* pass: the values in the dataset match or fall within the thresholds you specified

_includes/access-managed-agent.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
1. If you have not already done so, create a Soda Cloud account at <a href="https://cloud.soda.io/signup?utm_source=docs" target="_blank"> cloud.soda.io</a>. If you already have a Soda account, log in.
2+
2. By default, Soda prepares a Soda-hosted agent for all newly-created accounts. However, if you are an Admin in an existing Soda Cloud account and wish to use a Soda-hosted agent, navigate to **your avatar** > **Organization Settings**. In the **Organization** tab, click the checkbox to **Enable Soda-hosted Agent**.
3+
3. Navigate to **your avatar** > **Data Sources**, then access the **Agents** tab. Notice your out-of-the-box Soda-hosted agent that is up and running.
4+
5+
![soda-hosted-agent](/assets/images/soda-hosted-agent.png){:height="700px" width="700px"}

_includes/banner-agreements.md

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<div class="info">
2+
<span class="closebtn" onclick="this.parentElement.style.display='none';">&times;</span>
3+
The <strong>agreement</strong> feature is being deprecated and is only available upon request. Contact <a href="mailto:[email protected]">Soda Support</a> to request access.
4+
</div>

_includes/compatible-cloud-datasources.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<table>
22
<tr>
3-
<td>Amazon Athena<br /> Amazon Redshift<br /> Azure Synapse<br /> ClickHouse <br /> Databricks SQL<br />Denodo <br /> Dremio <br /> DuckDB <br /> GCP Big Query<br /> Google CloudSQL</td>
3+
<td>Amazon Athena<br /> Amazon Redshift<br /> Azure Synapse<br /> ClickHouse <br /> Databricks SQL<br />Denodo <br /> Dremio <br /> DuckDB <br /> GCP BigQuery<br /> Google CloudSQL</td>
44
<td>IBM DB2<br /> MotherDuck <br /> MS SQL Server<sup>1</sup><br /> MySQL<br > OracleDB<br />PostgreSQL<br /> Presto <br /> Snowflake<br /> Trino<br /> Vertica </td>
55
</tr>
66
</table>

_release-notes/soda-hosted-agent.md

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
name: "Soda-hosted Agent"
3+
date: 2024-02-01
4+
products:
5+
- soda-cloud
6+
---
7+
8+
Introducing a secure, out-of-the-box Soda-hosted Agent to manage access to data sources from within your Soda Cloud account. Quickly configure connections to your data sources in the Soda Cloud user interface, then empower all your colleagues to explore datasets, access check results, customize collections, and create their own no-code checks for data quality.
9+
10+
Learn how to [Set up a Soda-hosted agent]({% link soda-agent/managed-agent.md %}).

assets/images/soda-hosted-agent.png

140 KB
Loading

assets/images/soda-hosted-agent1.png

88.2 KB
Loading

assets/images/with-library.png

-39.4 KB
Loading

assets/images/with-managed-agent.png

135 KB
Loading

index.html

+3-3
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,10 @@ <h2>Get started</h2>
3636
<div>
3737
<img src="/assets/images/icons/[email protected]" width="54" height="40">
3838
<h2>What's new?</h2>
39-
<a href="/soda-cl/soda-cl-overview.html#define-sodacl-checks">✨ Create no-code checks ✨</a>
39+
<a href="/soda-agent/managed-agent.html">Set up Soda-hosted Agent</a>
40+
<a href="/soda-cl/soda-cl-overview.html#define-sodacl-checks">Create no-code checks</a>
4041
<a href="/soda-cloud/scan-mgmt.html">Manage scheduled scans</a>
41-
<a href="/soda/data-contracts.html">Data contracts</a>
42-
<a href="/soda/new-documentation.html#october-11-2023">Newly-revised documentation</a>
42+
<a href="/soda/data-contracts.html">Create data contracts</a>
4343
</div>
4444
<div>
4545
<img src="/assets/images/icons/[email protected]" width="54" height="40">

soda-agent/basics.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,11 @@ redirect_from: /soda-agent/
1010
<!--Linked to UI, access Shlink-->
1111
*Last modified on {% last_modified_at %}*
1212

13-
The **Soda Agent** is a tool that empowers Soda Cloud users to securely access data sources to scan for data quality. Create a Kubernetes cluster in a cloud services provider environment, then use Helm to deploy a Soda Agent in the cluster.
13+
The **Soda Agent** is a tool that empowers Soda Cloud users to securely access data sources to scan for data quality. For a self-hosted agent, create a Kubernetes cluster in a cloud services provider environment, then use Helm to deploy a Soda Agent in the cluster.
1414

15-
This setup enables Soda Cloud users to securely connect to data sources (Snowflake, Amazon Athena, etc.) from within the Soda Cloud web application. Any user in your Soda Cloud account can add a new data source via the agent, then write their own no-code checks and agreements to check for data quality in the new data source.
15+
This setup enables Soda Cloud users to securely connect to data sources (Snowflake, Amazon Athena, etc.) from within the Soda Cloud web application. Any user in your Soda Cloud account can add a new data source via the agent, then write their own no-code checks to check for data quality in the new data source.
1616

17-
What follows is an extremely abridged introduction to a few basic elements involved in the deployment and setup of a Soda Agent.
17+
What follows is an extremely abridged introduction to a few basic elements involved in the deployment and setup of a self-hosted Soda Agent.
1818

1919
![agent-diagram](/assets/images/agent-diagram.png){:height="700px" width="700px"}
2020

soda-agent/deploy.md

+59-10
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ redirect_from:
1919
<!--Linked to UI, access Shlink-->
2020
*Last modified on {% last_modified_at %}*
2121

22-
The **Soda Agent** is a tool that empowers Soda Cloud users to securely access data sources to scan for data quality. Create a Kubernetes cluster, then use Helm to deploy a Soda Agent in the cluster.
22+
The **Soda Agent** is a tool that empowers Soda Cloud users to securely access data sources to scan for data quality. Create a Kubernetes cluster, then use Helm to deploy a self-hosted Soda Agent in the cluster.
2323

24-
This setup enables Soda Cloud users to securely connect to data sources (BigQuery, Snowflake, etc.) from within the Soda Cloud web application. Any user in your Soda Cloud account can add a new data source via the agent, then write their own no-code checks and agreements to check for data quality in the new data source.
24+
This setup enables Soda Cloud users to securely connect to data sources (BigQuery, Snowflake, etc.) from within the Soda Cloud web application. Any user in your Soda Cloud account can add a new data source via the agent, then write their own no-code checks and agreements to check for data quality in the new data source. Alternatively, if you use a BigQuery, MySQL, PostgreSQL, or Snowflake data source, you can use a secure, out-of-the-box [Soda-hosted agent]({% link soda-agent/managed-agent.md %}) made available for every Soda Cloud organization.
2525

2626
As a step in the **Get started roadmap**, this guide offers instructions to set up, install, and configure Soda in a [self-hosted agent deployment model]({% link soda/setup-guide.md %}#self-hosted-agent).
2727

@@ -173,7 +173,7 @@ REVISION: 1
173173
```shell
174174
minikube kubectl -- describe pods
175175
```
176-
4. In your Soda Cloud account, navigate to **your avatar** > **Data Sources** > **Agents** tab. Refresh the page to verify that you see the agent you just created in the list of Agents. <br/><br/>Be aware that this may take several minutes to appear in your list of Soda Agents. Use the `describe pods` command in step 3 to check the status of the deployment. When `State: Running` and `Ready: True`, then you can refresh and see the agent in Soda Cloud.
176+
4. In your Soda Cloud account, navigate to **your avatar** > **Data Sources** > **Agents** tab. Refresh the page to verify that you see the agent you just created in the list of Agents. <br/><br/>Be aware that this may take several minutes to appear in your list of Soda Agents. Use the `describe pods` command in step 3 to check the status of the deployment. When `State: Running` and `Ready: True`, then you can refresh and see the agent in Soda Cloud.
177177
```shell
178178
...
179179
Containers:
@@ -190,7 +190,11 @@ Containers:
190190
```
191191
![agent-deployed](/assets/images/agent-deployed.png){:height="600px" width="600px"}
192192

193-
<br />
193+
If you do no see the agent listed in Soda Cloud, use the following command to review status and investigate the logs.
194+
```shell
195+
kubectl logs -l agent.soda.io/component=orchestrator -n soda-agent -f
196+
```
197+
<br/>
194198

195199
#### Deploy using a values YAML file
196200

@@ -246,6 +250,11 @@ Containers:
246250
```
247251
![agent-deployed](/assets/images/agent-deployed.png){:height="700px" width="700px"}
248252

253+
If you do no see the agent listed in Soda Cloud, use the following command to review status and investigate the logs.
254+
```shell
255+
kubectl logs -l agent.soda.io/component=orchestrator -n soda-agent -f
256+
```
257+
249258
If you use private key authentication with a Soda Agent, refer to [Soda Agent extras]({% link soda-agent/secrets.md %}#use-a-values-file-to-store-private-key-authentication-values).
250259

251260
<br />
@@ -258,7 +267,7 @@ If you use private key authentication with a Soda Agent, refer to [Soda Agent ex
258267

259268
1. Uninstall the Soda Agent in the cluster.
260269
```shell
261-
helm delete soda-agent -n soda-agent
270+
helm uninstall soda-agent -n soda-agent
262271
```
263272
2. Delete the cluster.
264273
```shell
@@ -421,6 +430,11 @@ Containers:
421430
```
422431
![agent-deployed](/assets/images/agent-deployed.png){:height="700px" width="700px"}
423432

433+
If you do no see the agent listed in Soda Cloud, use the following command to review status and investigate the logs.
434+
```shell
435+
kubectl logs -l agent.soda.io/component=orchestrator -n soda-agent -f
436+
```
437+
424438
<br />
425439

426440
#### Deploy using a values YAML file
@@ -478,6 +492,11 @@ Containers:
478492
```
479493
![agent-deployed](/assets/images/agent-deployed.png){:height="700px" width="700px"}
480494

495+
If you do no see the agent listed in Soda Cloud, use the following command to review status and investigate the logs.
496+
```shell
497+
kubectl logs -l agent.soda.io/component=orchestrator -n soda-agent -f
498+
```
499+
481500
<br />
482501

483502
### (Optional) Connect via AWS PrivateLink
@@ -494,6 +513,11 @@ kubectl -n soda-agent rollout restart deploy
494513
5. After you have started the agent and validated that it is running, log into your Soda Cloud account, then navigate to **your avatar** > **Data Sources** > **Agents** tab. Refresh the page to verify that you see the agent you just created in the list of Agents.
495514
![agent-deployed](/assets/images/agent-deployed.png){:height="700px" width="700px"}
496515

516+
If you do no see the agent listed in Soda Cloud, use the following command to review status and investigate the logs.
517+
```shell
518+
kubectl logs -l agent.soda.io/component=orchestrator -n soda-agent -f
519+
```
520+
497521

498522
### About the `helm install` command
499523

@@ -504,7 +528,7 @@ kubectl -n soda-agent rollout restart deploy
504528

505529
1. Uninstall the Soda Agent in the cluster.
506530
```shell
507-
helm delete soda-agent -n soda-agent
531+
helm uninstall soda-agent -n soda-agent
508532
```
509533
2. Remove the Fargate profile.
510534
```shell
@@ -823,6 +847,11 @@ soda-agent-orchestrator-ffd74c76-5g7tl 1/1 Running 0 32s
823847
Be aware that this may take several minutes to appear in your list of Soda Agents.
824848
![agent-deployed](/assets/images/agent-deployed.png){:height="700px" width="700px"}
825849

850+
If you do no see the agent listed in Soda Cloud, use the following command to review status and investigate the logs.
851+
```shell
852+
kubectl logs -l agent.soda.io/component=orchestrator -n soda-agent -f
853+
```
854+
826855
<br />
827856

828857
#### Deploy using CLI only - virtual cluster
@@ -877,6 +906,11 @@ soda-agent-orchestrator-ffd74c76-5g7tl 1/1 Running 0 32s
877906
Be aware that this may take several minutes to appear in your list of Soda Agents.
878907
![agent-deployed](/assets/images/agent-deployed.png){:height="700px" width="700px"}
879908

909+
If you do no see the agent listed in Soda Cloud, use the following command to review status and investigate the logs.
910+
```shell
911+
kubectl logs -l agent.soda.io/component=orchestrator -n soda-agent -f
912+
```
913+
880914
<br />
881915

882916
#### Deploy using a values YAML file
@@ -928,6 +962,11 @@ kubectl describe pods -n soda-agent
928962
8. In your Soda Cloud account, navigate to **your avatar** > **Data Sources** > **Agents** tab. Refresh the page to verify that you see the agent you just created in the list of Agents.
929963
![agent-deployed](/assets/images/agent-deployed.png){:height="700px" width="700px"}
930964

965+
If you do no see the agent listed in Soda Cloud, use the following command to review status and investigate the logs.
966+
```shell
967+
kubectl logs -l agent.soda.io/component=orchestrator -n soda-agent -f
968+
```
969+
931970
<br />
932971

933972
## About the `helm install` command
@@ -1178,6 +1217,11 @@ Status: Running
11781217
```
11791218
![agent-deployed](/assets/images/agent-deployed.png){:height="600px" width="600px"}
11801219

1220+
If you do no see the agent listed in Soda Cloud, use the following command to review status and investigate the logs.
1221+
```shell
1222+
kubectl logs -l agent.soda.io/component=orchestrator -n soda-agent -f
1223+
```
1224+
11811225
<br />
11821226

11831227
#### Deploy using a values YAML file
@@ -1234,6 +1278,11 @@ Status: Running
12341278
```
12351279
![agent-deployed](/assets/images/agent-deployed.png){:height="700px" width="700px"}
12361280

1281+
If you do no see the agent listed in Soda Cloud, use the following command to review status and investigate the logs.
1282+
```shell
1283+
kubectl logs -l agent.soda.io/component=orchestrator -n soda-agent -f
1284+
```
1285+
12371286
<br />
12381287

12391288
## About the `helm install` command
@@ -1244,7 +1293,7 @@ Status: Running
12441293

12451294
1. Uninstall the Soda Agent in the cluster.
12461295
```shell
1247-
helm delete soda-agent -n soda-agent
1296+
helm uninstall soda-agent -n soda-agent
12481297
```
12491298
2. Delete the cluster.
12501299
```shell
@@ -1273,7 +1322,7 @@ In your Soda Cloud account, navigate to **your avatar** > **Data Sources**. Clic
12731322
| ----------------------- | ---------- |
12741323
| Data Source Label | Provide a unique identifier for the data source. Soda Cloud uses the label you provide to define the immutable name of the data source against which it runs the Default Scan.|
12751324
| Default Scan Schedule Label | Provide a name for the default scan schedule for this data sources. The scan schedule indicates which Soda Agent to use to execute the scan, and when. |
1276-
| Default Scan Schedule Agent | Select the name of a Soda Agent that you have previously set up in your secure environment and connected to a specific data source. This identifies the Soda Agent to which Soda Cloud must connect in order to run its scan. |
1325+
| Default Scan Schedule Agent | Select the name of a Soda Agent that you have previously set up in your secure environment. This identifies the Soda Agent to which Soda Cloud must connect in order to run its scan. |
12771326
| Schedule Definition | Provide the scan frequency details Soda Cloud uses to execute scans according to your needs. If you wish, you can define the schedule as a cron expression. |
12781327
| Starting At | Select the time of day to run the scan. The default value is midnight. |
12791328
| Time Zone | Select a timezone. The default value is UTC. |
@@ -1289,7 +1338,7 @@ To more securely provide sensitive values such as usernames and passwords, use e
12891338

12901339
Access the data source-specific connection configurations listed below to copy+paste the connection syntax into the editing panel, then adjust the values to correspond with your data source's details. Access connection configuration details in [Data source reference]({% link soda/connect-athena.md %}) section of Soda documentation.
12911340

1292-
See also: [Use a file reference for a Big Query data source connection](#use-a-file-reference-for-a-big-query-data-source-connection)
1341+
See also: [Use a file reference for a BigQuery data source connection](#use-a-file-reference-for-a-bigquery-data-source-connection)
12931342

12941343
<br />
12951344

@@ -1352,7 +1401,7 @@ automated monitoring:
13521401

13531402
<br />
13541403

1355-
### Use a file reference for a Big Query data source connection
1404+
### Use a file reference for a BigQuery data source connection
13561405

13571406
If you already store information about your data source in a JSON file in a secure location, you can configure your BigQuery data source connection details in Soda Cloud to refer to the JSON file for service account information. To do so, you must add two elements:
13581407
* `volumes` and `volumeMounts` parameters in the `values.yml` file that your Soda Agent helm chart uses

0 commit comments

Comments
 (0)