Skip to content

Commit b97ba9a

Browse files
author
Jesse
authored
[PECO-205] Add functional examples (#52)
Signed-off-by: Jesse Whitehouse <[email protected]>
1 parent 2a638c4 commit b97ba9a

File tree

7 files changed

+247
-0
lines changed

7 files changed

+247
-0
lines changed

examples/README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# `databricks-sql-connector` Example Usage
2+
3+
We provide example scripts so you can see the connector in action for basic usage. You need a Databricks account to run them. The scripts expect to find your Databricks account credentials in these environment variables:
4+
5+
- DATABRICKS_SERVER_HOSTNAME
6+
- DATABRICKS_HTTP_PATH
7+
- DATABRICKS_TOKEN
8+
9+
Follow the quick start in our [README](../README.md) to install `databricks-sql-connector` and see
10+
how to find the hostname, http path, and access token. Note that for the OAuth examples below a
11+
personal access token is not needed.
12+
13+
14+
## How to run an example script
15+
16+
To run all of these examples you can clone the entire repository to your disk. Or you can use `curl` to fetch an individual script.
17+
18+
### Clone the repo
19+
1. Clone this repository to your local system
20+
2. Follow the quick start in the [README](../README.md) to install the connector and obtain authentication credentials.
21+
3. `cd examples/`
22+
4. Then run any script using the `python` CLI. For example `python query_execute.py`
23+
24+
### Fetch with `curl`
25+
26+
1. Follow the quick start in the [README](../README.md) to install the connector and obtain authentication credentials.
27+
2. Use the GitHub UI to find the URL to the **Raw** version of one of these examples. For example: `https://raw.githubusercontent.com/databricks/databricks-sql-python/main/examples/query_execute.py`
28+
3. `curl` this URL to your local file-system: `curl https://raw.githubusercontent.com/databricks/databricks-sql-python/main/examples/query_execute.py > query_execute.py`
29+
4. Then run the script with the `python` CLI. `python query_execute.py`
30+
# Table of Contents
31+
32+
- **`query_execute.py`** connects to the `samples` database of your default catalog, runs a small query, and prints the result to screen.
33+
- **`insert_data.py`** adds a tables called `squares` to your default catalog and inserts one hundred rows of example data. Then it fetches this data and prints it to the screen.
34+
- **`query_cancel.py`** shows how to cancel a query assuming that you can access the `Cursor` executing that query from a different thread. This is necessary because `databricks-sql-connector` does not yet implement an asynchronous API; calling `.execute()` blocks the current thread until execution completes. Therefore, the connector can't cancel queries from the same thread where they began.
35+
- **`interactive_oauth.py`** shows the simplest example of authenticating by OAuth (no need for a PAT generated in the DBSQL UI) while Bring Your Own IDP is in public preview. When you run the script it will open a browser window so you can authenticate. Afterward, the script fetches some sample data from Databricks and prints it to the screen. For this script, the OAuth token is not persisted which means you need to authenticate every time you run the script.
36+
- **`persistent_oauth.py`** shows a more advanced example of authenticating by OAuth while Bring Your Own IDP is in public preview. In this case, it shows how to use a sublcass of `OAuthPersistence` to reuse an OAuth token across script executions.
37+
- **`set_user_agent.py`** shows how to customize the user agent header used for Thrift commands. In
38+
this example the string `ExamplePartnerTag` will be added to the the user agent on every request.

examples/insert_data.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
from databricks import sql
2+
import os
3+
4+
with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
5+
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
6+
access_token = os.getenv("DATABRICKS_TOKEN")) as connection:
7+
8+
with connection.cursor() as cursor:
9+
cursor.execute("CREATE TABLE IF NOT EXISTS squares (x int, x_squared int)")
10+
11+
squares = [(i, i * i) for i in range(100)]
12+
values = ",".join([f"({x}, {y})" for (x, y) in squares])
13+
14+
cursor.execute(f"INSERT INTO squares VALUES {values}")
15+
16+
cursor.execute("SELECT * FROM squares LIMIT 10")
17+
18+
result = cursor.fetchall()
19+
20+
for row in result:
21+
print(row)

examples/interactive_oauth.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
from databricks import sql
2+
import os
3+
4+
"""Bring Your Own Identity Provider with fined grained OAuth scopes is currently public preview on
5+
Databricks in AWS. databricks-sql-connector supports user to machine OAuth login which means the
6+
end user has to be present to login in a browser which will be popped up by the Python process. You
7+
must enable OAuth in your Databricks account to run this example. More information on how to enable
8+
OAuth in your Databricks Account in AWS can be found here:
9+
10+
https://docs.databricks.com/administration-guide/account-settings-e2/single-sign-on.html
11+
12+
Pre-requisites:
13+
- You have a Databricks account in AWS.
14+
- You have configured OAuth in Databricks account in AWS using the link above.
15+
- You have installed a browser (Chrome, Firefox, Safari, Internet Explorer, etc) that will be
16+
accessible on the machine for performing OAuth login.
17+
18+
This code does not persist the auth token. Hence after the Python process terminates the
19+
end user will have to login again. See examples/persistent_oauth.py to learn about persisting the
20+
token across script executions.
21+
22+
Bring Your Own Identity Provider is in public preview. The API may change prior to becoming GA.
23+
You can monitor these two links to find out when it will become generally available:
24+
25+
1. https://docs.databricks.com/administration-guide/account-settings-e2/single-sign-on.html
26+
2. https://docs.databricks.com/dev-tools/python-sql-connector.html
27+
"""
28+
29+
with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
30+
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
31+
auth_type="databricks-oauth") as connection:
32+
33+
for x in range(1, 100):
34+
cursor = connection.cursor()
35+
cursor.execute('SELECT 1+1')
36+
result = cursor.fetchall()
37+
for row in result:
38+
print(row)
39+
cursor.close()
40+
41+
connection.close()

examples/persistent_oauth.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
"""Bring Your Own Identity Provider with fined grained OAuth scopes is currently public preview on
2+
Databricks in AWS. databricks-sql-connector supports user to machine OAuth login which means the
3+
end user has to be present to login in a browser which will be popped up by the Python process. You
4+
must enable OAuth in your Databricks account to run this example. More information on how to enable
5+
OAuth in your Databricks Account in AWS can be found here:
6+
7+
https://docs.databricks.com/administration-guide/account-settings-e2/single-sign-on.html
8+
9+
Pre-requisites:
10+
- You have a Databricks account in AWS.
11+
- You have configured OAuth in Databricks account in AWS using the link above.
12+
- You have installed a browser (Chrome, Firefox, Safari, Internet Explorer, etc) that will be
13+
accessible on the machine for performing OAuth login.
14+
15+
For security, databricks-sql-connector does not persist OAuth tokens automatically. Hence, after
16+
the Python process terminates the end user will have to log-in again. We provide APIs to be
17+
implemented by the end user for persisting the OAuth token. The SampleOAuthPersistence reference
18+
shows which methods you may implement.
19+
20+
For this example, the DevOnlyFilePersistence class is provided. Do not use this in production.
21+
22+
Bring Your Own Identity Provider is in public preview. The API may change prior to becoming GA.
23+
You can monitor these two links to find out when it will become generally available:
24+
25+
1. https://docs.databricks.com/administration-guide/account-settings-e2/single-sign-on.html
26+
2. https://docs.databricks.com/dev-tools/python-sql-connector.html
27+
"""
28+
29+
import os
30+
from typing import Optional
31+
32+
from databricks import sql
33+
from databricks.sql.experimental.oauth_persistence import OAuthPersistence, OAuthToken, DevOnlyFilePersistence
34+
35+
36+
class SampleOAuthPersistence(OAuthPersistence):
37+
def persist(self, hostname: str, oauth_token: OAuthToken):
38+
"""To be implemented by the end user to persist in the preferred storage medium.
39+
40+
OAuthToken has two properties:
41+
1. OAuthToken.access_token
42+
2. OAuthToken.refresh_token
43+
44+
Both should be persisted.
45+
"""
46+
pass
47+
48+
def read(self, hostname: str) -> Optional[OAuthToken]:
49+
"""To be implemented by the end user to fetch token from the preferred storage
50+
51+
Fetch the access_token and refresh_token for the given hostname.
52+
Return OAuthToken(access_token, refresh_token)
53+
"""
54+
pass
55+
56+
with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
57+
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
58+
auth_type="databricks-oauth",
59+
experimental_oauth_persistence=DevOnlyFilePersistence("./sample.json")) as connection:
60+
61+
for x in range(1, 100):
62+
cursor = connection.cursor()
63+
cursor.execute('SELECT 1+1')
64+
result = cursor.fetchall()
65+
for row in result:
66+
print(row)
67+
cursor.close()
68+
69+
connection.close()

examples/query_cancel.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
from databricks import sql
2+
import os, threading, time
3+
4+
"""
5+
The current operation of a cursor may be cancelled by calling its `.cancel()` method as shown in the example below.
6+
"""
7+
8+
with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
9+
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
10+
access_token = os.getenv("DATABRICKS_TOKEN")) as connection:
11+
12+
with connection.cursor() as cursor:
13+
def execute_really_long_query():
14+
try:
15+
cursor.execute("SELECT SUM(A.id - B.id) " +
16+
"FROM range(1000000000) A CROSS JOIN range(100000000) B " +
17+
"GROUP BY (A.id - B.id)")
18+
except sql.exc.RequestError:
19+
print("It looks like this query was cancelled.")
20+
21+
exec_thread = threading.Thread(target=execute_really_long_query)
22+
23+
print("\n Beginning to execute long query")
24+
exec_thread.start()
25+
26+
# Make sure the query has started before cancelling
27+
print("\n Waiting 15 seconds before canceling", end="", flush=True)
28+
29+
seconds_waited = 0
30+
while seconds_waited < 15:
31+
seconds_waited += 1
32+
print(".", end="", flush=True)
33+
time.sleep(1)
34+
35+
print("\n Cancelling the cursor's operation. This can take a few seconds.")
36+
cursor.cancel()
37+
38+
print("\n Now checking the cursor status:")
39+
exec_thread.join(5)
40+
41+
assert not exec_thread.is_alive()
42+
print("\n The previous command was successfully canceled")
43+
44+
print("\n Now reusing the cursor to run a separate query.")
45+
46+
# We can still execute a new command on the cursor
47+
cursor.execute("SELECT * FROM range(3)")
48+
49+
print("\n Execution was successful. Results appear below:")
50+
51+
print(cursor.fetchall())

examples/query_execute.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
from databricks import sql
2+
import os
3+
4+
with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
5+
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
6+
access_token = os.getenv("DATABRICKS_TOKEN")) as connection:
7+
8+
with connection.cursor() as cursor:
9+
cursor.execute("SELECT * FROM default.diamonds LIMIT 2")
10+
result = cursor.fetchall()
11+
12+
for row in result:
13+
print(row)

examples/set_user_agent.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
from databricks import sql
2+
import os
3+
4+
with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
5+
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
6+
access_token = os.getenv("DATABRICKS_TOKEN"),
7+
_user_agent_entry="ExamplePartnerTag") as connection:
8+
9+
with connection.cursor() as cursor:
10+
cursor.execute("SELECT * FROM default.diamonds LIMIT 2")
11+
result = cursor.fetchall()
12+
13+
for row in result:
14+
print(row)

0 commit comments

Comments
 (0)