[PECO-205] Add functional examples (#52)

Jesse · web-flow · commit b97ba9a951ac · 2022-09-30T16:51:32.000-05:00
Signed-off-by: Jesse Whitehouse &lt;jesse.whitehouse@databricks.com&gt;
diff --git a/examples/README.md b/examples/README.md
@@ -0,0 +1,38 @@
+# `databricks-sql-connector` Example Usage
+
+We provide example scripts so you can see the connector in action for basic usage. You need a Databricks account to run them. The scripts expect to find your Databricks account credentials in these environment variables:
+
+    - DATABRICKS_SERVER_HOSTNAME
+    - DATABRICKS_HTTP_PATH
+    - DATABRICKS_TOKEN
+
+Follow the quick start in our [README](../README.md) to install `databricks-sql-connector` and see
+how to find the hostname, http path, and access token. Note that for the OAuth examples below a 
+personal access token is not needed.
+
+
+## How to run an example script
+
+To run all of these examples you can clone the entire repository to your disk. Or you can use `curl` to fetch an individual script.
+
+### Clone the repo
+1. Clone this repository to your local system
+2. Follow the quick start in the [README](../README.md) to install the connector and obtain authentication credentials.
+3. `cd examples/`
+4. Then run any script using the `python` CLI. For example `python query_execute.py`
+
+### Fetch with `curl`
+
+1. Follow the quick start in the [README](../README.md) to install the connector and obtain authentication credentials.
+2. Use the GitHub UI to find the URL to the **Raw** version of one of these examples. For example: `https://raw.githubusercontent.com/databricks/databricks-sql-python/main/examples/query_execute.py`
+3. `curl` this URL to your local file-system: `curl https://raw.githubusercontent.com/databricks/databricks-sql-python/main/examples/query_execute.py > query_execute.py`
+4. Then run the script with the `python` CLI. `python query_execute.py`
+# Table of Contents
+
+- **`query_execute.py`** connects to the `samples` database of your default catalog, runs a small query, and prints the result to screen.
+- **`insert_data.py`** adds a tables called `squares` to your default catalog and inserts one hundred rows of example data. Then it fetches this data and prints it to the screen.
+- **`query_cancel.py`** shows how to cancel a query assuming that you can access the `Cursor` executing that query from a different thread. This is necessary because `databricks-sql-connector` does not yet implement an asynchronous API; calling `.execute()` blocks the current thread until execution completes. Therefore, the connector can't cancel queries from the same thread where they began.
+- **`interactive_oauth.py`** shows the simplest example of authenticating by OAuth (no need for a PAT generated in the DBSQL UI) while Bring Your Own IDP is in public preview. When you run the script it will open a browser window so you can authenticate. Afterward, the script fetches some sample data from Databricks and prints it to the screen. For this script, the OAuth token is not persisted which means you need to authenticate every time you run the script.
+- **`persistent_oauth.py`** shows a more advanced example of authenticating by OAuth while Bring Your Own IDP is in public preview. In this case, it shows how to use a sublcass of `OAuthPersistence` to reuse an OAuth token across script executions.
+- **`set_user_agent.py`** shows how to customize the user agent header used for Thrift commands. In
+this example the string `ExamplePartnerTag` will be added to the the user agent on every request.
diff --git a/examples/insert_data.py b/examples/insert_data.py
@@ -0,0 +1,21 @@
+from databricks import sql
+import os
+
+with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
+                 http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
+                 access_token    = os.getenv("DATABRICKS_TOKEN")) as connection:
+
+  with connection.cursor() as cursor:
+    cursor.execute("CREATE TABLE IF NOT EXISTS squares (x int, x_squared int)")
+
+    squares = [(i, i * i) for i in range(100)]
+    values = ",".join([f"({x}, {y})" for (x, y) in squares])
+
+    cursor.execute(f"INSERT INTO squares VALUES {values}")
+
+    cursor.execute("SELECT * FROM squares LIMIT 10")
+
+    result = cursor.fetchall()
+
+    for row in result:
+      print(row)
diff --git a/examples/interactive_oauth.py b/examples/interactive_oauth.py
@@ -0,0 +1,41 @@
+from databricks import sql
+import os
+
+"""Bring Your Own Identity Provider with fined grained OAuth scopes is currently public preview on
+Databricks in AWS. databricks-sql-connector supports user to machine OAuth login which means the
+end user has to be present to login in a browser which will be popped up by the Python process. You
+must enable OAuth in your Databricks account to run this example. More information on how to enable
+OAuth in your Databricks Account in AWS can be found here:
+
+https://docs.databricks.com/administration-guide/account-settings-e2/single-sign-on.html
+
+Pre-requisites:
+- You have a Databricks account in AWS.
+- You have configured OAuth in Databricks account in AWS using the link above.
+- You have installed a browser (Chrome, Firefox, Safari, Internet Explorer, etc) that will be
+  accessible on the machine for performing OAuth login.
+
+This code does not persist the auth token. Hence after the Python process terminates the
+end user will have to login again. See examples/persistent_oauth.py to learn about persisting the
+token across script executions.
+
+Bring Your Own Identity Provider is in public preview. The API may change prior to becoming GA. 
+You can monitor these two links to find out when it will become generally available:
+
+  1. https://docs.databricks.com/administration-guide/account-settings-e2/single-sign-on.html 
+  2. https://docs.databricks.com/dev-tools/python-sql-connector.html
+"""
+
+with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
+                 http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
+                 auth_type="databricks-oauth") as connection:
+
+    for x in range(1, 100):
+        cursor = connection.cursor()
+        cursor.execute('SELECT 1+1')
+        result = cursor.fetchall()
+        for row in result:
+            print(row)
+        cursor.close()
+
+    connection.close()
diff --git a/examples/persistent_oauth.py b/examples/persistent_oauth.py
@@ -0,0 +1,69 @@
+"""Bring Your Own Identity Provider with fined grained OAuth scopes is currently public preview on
+Databricks in AWS. databricks-sql-connector supports user to machine OAuth login which means the
+end user has to be present to login in a browser which will be popped up by the Python process. You
+must enable OAuth in your Databricks account to run this example. More information on how to enable
+OAuth in your Databricks Account in AWS can be found here:
+
+https://docs.databricks.com/administration-guide/account-settings-e2/single-sign-on.html
+
+Pre-requisites:
+- You have a Databricks account in AWS.
+- You have configured OAuth in Databricks account in AWS using the link above.
+- You have installed a browser (Chrome, Firefox, Safari, Internet Explorer, etc) that will be
+  accessible on the machine for performing OAuth login.
+
+For security, databricks-sql-connector does not persist OAuth tokens automatically. Hence, after
+the Python process terminates the end user will have to log-in again. We provide APIs to be
+implemented by the end user for persisting the OAuth token. The SampleOAuthPersistence reference
+shows which methods you may implement.
+
+For this example, the DevOnlyFilePersistence class is provided. Do not use this in production.
+
+Bring Your Own Identity Provider is in public preview. The API may change prior to becoming GA. 
+You can monitor these two links to find out when it will become generally available:
+
+  1. https://docs.databricks.com/administration-guide/account-settings-e2/single-sign-on.html 
+  2. https://docs.databricks.com/dev-tools/python-sql-connector.html
+"""
+
+import os
+from typing import Optional
+
+from databricks import sql
+from databricks.sql.experimental.oauth_persistence import OAuthPersistence, OAuthToken, DevOnlyFilePersistence
+
+
+class SampleOAuthPersistence(OAuthPersistence):
+  def persist(self, hostname: str, oauth_token: OAuthToken):
+    """To be implemented by the end user to persist in the preferred storage medium.
+    
+    OAuthToken has two properties:
+        1. OAuthToken.access_token
+        2. OAuthToken.refresh_token 
+
+    Both should be persisted.
+    """
+    pass
+
+  def read(self, hostname: str) -> Optional[OAuthToken]:
+    """To be implemented by the end user to fetch token from the preferred storage
+
+    Fetch the access_token and refresh_token for the given hostname.
+    Return OAuthToken(access_token, refresh_token)
+    """
+    pass
+
+with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
+                 http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
+                 auth_type="databricks-oauth",
+                 experimental_oauth_persistence=DevOnlyFilePersistence("./sample.json")) as connection:
+
+    for x in range(1, 100):
+        cursor = connection.cursor()
+        cursor.execute('SELECT 1+1')
+        result = cursor.fetchall()
+        for row in result:
+            print(row)
+        cursor.close()
+
+    connection.close()
diff --git a/examples/query_cancel.py b/examples/query_cancel.py
@@ -0,0 +1,51 @@
+from databricks import sql
+import os, threading, time
+
+"""
+The current operation of a cursor may be cancelled by calling its `.cancel()` method as shown in the example below.
+"""
+
+with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
+                 http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
+                 access_token    = os.getenv("DATABRICKS_TOKEN")) as connection:
+
+  with connection.cursor() as cursor:
+    def execute_really_long_query():
+        try:
+            cursor.execute("SELECT SUM(A.id - B.id) " +
+                            "FROM range(1000000000) A CROSS JOIN range(100000000) B " +
+                            "GROUP BY (A.id - B.id)")
+        except sql.exc.RequestError:
+          print("It looks like this query was cancelled.")
+
+    exec_thread = threading.Thread(target=execute_really_long_query)
+    
+    print("\n Beginning to execute long query")
+    exec_thread.start()
+    
+    # Make sure the query has started before cancelling
+    print("\n Waiting 15 seconds before canceling", end="", flush=True)
+    
+    seconds_waited = 0
+    while seconds_waited < 15:
+      seconds_waited += 1
+      print(".", end="", flush=True)
+      time.sleep(1)
+
+    print("\n Cancelling the cursor's operation. This can take a few seconds.")
+    cursor.cancel()
+    
+    print("\n Now checking the cursor status:")
+    exec_thread.join(5)
+
+    assert not exec_thread.is_alive()
+    print("\n The previous command was successfully canceled")
+
+    print("\n Now reusing the cursor to run a separate query.")
+    
+    # We can still execute a new command on the cursor
+    cursor.execute("SELECT * FROM range(3)")
+
+    print("\n Execution was successful. Results appear below:")
+
+    print(cursor.fetchall())
diff --git a/examples/query_execute.py b/examples/query_execute.py
@@ -0,0 +1,13 @@
+from databricks import sql
+import os
+
+with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
+                 http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
+                 access_token    = os.getenv("DATABRICKS_TOKEN")) as connection:
+
+  with connection.cursor() as cursor:
+    cursor.execute("SELECT * FROM default.diamonds LIMIT 2")
+    result = cursor.fetchall()
+
+    for row in result:
+      print(row)
diff --git a/examples/set_user_agent.py b/examples/set_user_agent.py
@@ -0,0 +1,14 @@
+from databricks import sql
+import os
+
+with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
+                 http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
+                 access_token    = os.getenv("DATABRICKS_TOKEN"),
+                 _user_agent_entry="ExamplePartnerTag") as connection:
+
+  with connection.cursor() as cursor:
+    cursor.execute("SELECT * FROM default.diamonds LIMIT 2")
+    result = cursor.fetchall()
+
+    for row in result:
+      print(row)