|
| 1 | +--- |
| 2 | +title: "AI_TO_SQL" |
| 3 | +--- |
| 4 | + |
| 5 | +Converts natural language instructions into SQL queries with the latest model `text-davinci-003`. |
| 6 | + |
| 7 | +Databend offers an efficient solution for constructing SQL queries by incorporating OLAP and AI. Through this function, instructions written in a natural language can be converted into SQL query statements that align with the table schema. For example, the function can be provided with a sentence like "Get all items that cost 10 dollars or less" as an input and generate the corresponding SQL query `SELECT * FROM items WHERE price <= 10` as output. |
| 8 | + |
| 9 | +The main code implementation can be found [here](https://github.com/databendlabs/databend/blob/1e93c5b562bd159ecb0f336bb88fd1b7f9dc4a62/src/query/service/src/table_functions/openai/ai_to_sql.rs). |
| 10 | + |
| 11 | +:::note |
| 12 | +The SQL query statements generated adhere to the PostgreSQL standards, so they might require manual revisions to align with the syntax of Databend. |
| 13 | +::: |
| 14 | + |
| 15 | +:::info |
| 16 | +Starting from Databend v1.1.47, Databend supports the [Azure OpenAI service](https://azure.microsoft.com/en-au/products/cognitive-services/openai-service). |
| 17 | + |
| 18 | +This integration offers improved data privacy. |
| 19 | + |
| 20 | +To use Azure OpenAI, add the following configurations to the `[query]` section: |
| 21 | + |
| 22 | +```sql |
| 23 | +# Azure OpenAI |
| 24 | +openai_api_chat_base_url = "https://<name>.openai.azure.com/openai/deployments/<name>/" |
| 25 | +openai_api_embedding_base_url = "https://<name>.openai.azure.com/openai/deployments/<name>/" |
| 26 | +openai_api_version = "2023-03-15-preview" |
| 27 | +``` |
| 28 | + |
| 29 | +::: |
| 30 | + |
| 31 | +:::caution |
| 32 | +Databend relies on (Azure) OpenAI for `AI_TO_SQL` but only sends the table schema to (Azure) OpenAI, not the data. |
| 33 | + |
| 34 | +They will only work when the Databend configuration includes the `openai_api_key`, otherwise they will be inactive. |
| 35 | + |
| 36 | +This function is available by default on [Databend Cloud](https://databend.com) using our Azure OpenAI key. If you use them, you acknowledge that your table schema will be sent to Azure OpenAI by us. |
| 37 | +::: |
| 38 | + |
| 39 | +## Syntax |
| 40 | + |
| 41 | +```sql |
| 42 | +USE <your-database>; |
| 43 | +SELECT * FROM ai_to_sql('<natural-language-instruction>'); |
| 44 | +``` |
| 45 | + |
| 46 | +:::tip Obtain and Config OpenAI API Key |
| 47 | + |
| 48 | +- To obtain your openAI API key, please visit https://platform.openai.com/account/api-keys and generate a new key. |
| 49 | +- Configure the **databend-query.toml** file with the openai_api_key setting. |
| 50 | + |
| 51 | +```toml |
| 52 | +[query] |
| 53 | +... ... |
| 54 | +openai_api_key = "<your-key>" |
| 55 | +``` |
| 56 | + |
| 57 | +::: |
| 58 | + |
| 59 | +## Examples |
| 60 | + |
| 61 | +In this example, an SQL query statement is generated from an instruction with the AI_TO_SQL function, and the resulting statement is executed to obtain the query results. |
| 62 | + |
| 63 | +1. Prepare data. |
| 64 | + |
| 65 | +```sql |
| 66 | +CREATE DATABASE IF NOT EXISTS openai; |
| 67 | +USE openai; |
| 68 | + |
| 69 | +CREATE TABLE users( |
| 70 | + id INT, |
| 71 | + name VARCHAR, |
| 72 | + age INT, |
| 73 | + country VARCHAR |
| 74 | +); |
| 75 | + |
| 76 | +CREATE TABLE orders( |
| 77 | + order_id INT, |
| 78 | + user_id INT, |
| 79 | + product_name VARCHAR, |
| 80 | + price DECIMAL(10,2), |
| 81 | + order_date DATE |
| 82 | +); |
| 83 | + |
| 84 | +-- Insert sample data into the users table |
| 85 | +INSERT INTO users VALUES (1, 'Alice', 31, 'USA'), |
| 86 | + (2, 'Bob', 32, 'USA'), |
| 87 | + (3, 'Charlie', 45, 'USA'), |
| 88 | + (4, 'Diana', 29, 'USA'), |
| 89 | + (5, 'Eva', 35, 'Canada'); |
| 90 | + |
| 91 | +-- Insert sample data into the orders table |
| 92 | +INSERT INTO orders VALUES (1, 1, 'iPhone', 1000.00, '2022-03-05'), |
| 93 | + (2, 1, 'OpenAI Plus', 20.00, '2022-03-06'), |
| 94 | + (3, 2, 'OpenAI Plus', 20.00, '2022-03-07'), |
| 95 | + (4, 2, 'MacBook Pro', 2000.00, '2022-03-10'), |
| 96 | + (5, 3, 'iPad', 500.00, '2022-03-12'), |
| 97 | + (6, 3, 'AirPods', 200.00, '2022-03-14'); |
| 98 | +``` |
| 99 | + |
| 100 | +2. Run the AI_TO_SQL function with an instruction written in English as the input. |
| 101 | + |
| 102 | +```sql |
| 103 | +SELECT * FROM ai_to_sql( |
| 104 | + 'List the total amount spent by users from the USA who are older than 30 years, grouped by their names, along with the number of orders they made in 2022'); |
| 105 | +``` |
| 106 | + |
| 107 | +A SQL statement is generated by the function as the output: |
| 108 | + |
| 109 | +```sql |
| 110 | +*************************** 1. row *************************** |
| 111 | + database: openai |
| 112 | +generated_sql: SELECT name, SUM(price) AS total_spent, COUNT(order_id) AS total_orders |
| 113 | + FROM users |
| 114 | + JOIN orders ON users.id = orders.user_id |
| 115 | + WHERE country = 'USA' AND age > 30 AND order_date BETWEEN '2022-01-01' AND '2022-12-31' |
| 116 | + GROUP BY name; |
| 117 | +``` |
| 118 | + |
| 119 | +3. Run the generated SQL statement to get the query results. |
| 120 | + |
| 121 | +```sql |
| 122 | ++---------+-------------+-------------+ |
| 123 | +| name | order_count | total_spent | |
| 124 | ++---------+-------------+-------------+ |
| 125 | +| Bob | 2 | 2020.00 | |
| 126 | +| Alice | 2 | 1020.00 | |
| 127 | +| Charlie | 2 | 700.00 | |
| 128 | ++---------+-------------+-------------+ |
| 129 | +``` |
0 commit comments