Azure SQL can be used to build intelligent applications. To see this solution in action navigate to this website
Combine the power of Azure SQL and OpenAI and use familiar tools to create enhanced intelligent database solutions.
In this code, using a python Jupyter Notebook we ingest a large number of documents from a storage account (or you can use a SharePoint Site, code is there), save the chunked information into an Azure SQL database using a stored procedure.
The stored procedure saves data to the documents table, saves the embeddings and creates similarity vector table, as well as saving key phrases into a graph table for searching.
You can use the AskDocumentQuestion Stored procedure that takes system message and question as parameters to answer question about your data.
Using vanilla AdventureWorks Database, you can ask insightful questions about your data, right inside your SQL server
To implement this solution the following components were used
-
Azure SQL Database features:
-
Azure AI Services
This repository containes the following assets and code:
- Azure SQL Database bacpac file
- Requirements.txt
- SQLGraphRag Jupiter Notebook with code needed to ingest documents into your database
- Sample documents
Listed below are the services needed for this solution, if you don't have an azure subscription, you can create a free one. If you already have an subscription, please make sure that your administration has granted access to the services below:
- Azure Subscription
- Azure SQL Serverless
- Azure OpenAI Services
- Azure Document Intelligence
- Azure AI Language
Programming Tools needed:
This project should take about 1 hour to complete
Important
Before you begin, clone this repository to your local drive
- Azure account - login or create one
- Create a resource group
- Create a Storage Account
- Create the Azure SQL Database
- Create OpenAI Account and Deploy Models
- Create Azure Document Intelligence Service
- Create Azure AI Language Service
- Upload documents to storage account
- Configure Stored Procedure
First, you will need an Azure account. If you don't already have one, you can start a free trial of Azure here.
Log into the Azure Portal using your credentials
If you are new to Azure, a resource group is a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group, click here to learn how to create a group
Write the name of your resource group on a text file, we will need it later
If you don't have a storage account yet, please create one, to learn more about creating a storage account, click here.
Create a container name, you can use nasa-documents
or create your own name
Note the storage account name and access key and the container name in your text file.
Upload the file nasa-documents.bacpac
located under the folder data-files
to a storage account in your subscription.
Import the database package to a serverless database, for more information on how to do this click here.
[!IMPORTANT] Configure your database as a
Hyperscale - Serverless database
If you do not have an OpenAI account, create one, you have your Azure OpenAI service, make suru have or deploy two models
1. Completions model, we used `gtp-4o` for this demo,if you can, please use this model.
2. Embeddings model, use text-embedding-ada-002 for this demo.
If the models are alredy deployed, use those, if not, for more information on how to deploy this service and models click here
Note the Azure OpenAI service endpoint, API key and the model name on your text file
Document intelligence will be used to chunk documents using top notch technologies to read your documents.
If you do not have a document intelligence service account, create one, for more information click here
Do not use the free SKU.
Note the document intelligence service endpoint, API key on your text file
Azure AI Language Service will be used to extract key phrases from each document chunk, you can also use Azure AI Language Service to extract sentiment analysis, Entities and more.
If you do not have a Azure AI Language Service , create one:
- In the Azure Portal, search for “Azure AI services” and select “Create” under Language Service1.
- Fill in the required details such as the resource name, subscription, resource group, and location. Choose the pricing tier that suits your needs (you can start with the Free tier).
Note the Azure AI Language service endpoint, API key on your text file
Download the file nasa-documents.zip to your local computer, extract the files and load them to the container you created on step 3
Log into the Azure Portal, navigate to your sql database and open the query editor for the nasa-documents
database (or you can use SQL Server Management Studio).
Once logged in expand the stored procedure sections, click on the elipsis and select View Definition
scroll down to line 33, you will need to update your OpenAI configuration there
Once you make the changes, click on run.
Using VS Studio, open the project folder.
Provide settings for Open AI and Database.You can either create a file named secrets.env
file in the root of this folder and use VS Code app's UI later (easier).
AFR_ENDPOINT=https://YOUR-DOCUMENT-INTELLIGENCE-SERIVCE-NAME.cognitiveservices.azure.com/
AFR_API_KEY=YOUR-DOCUMENT-INTELLIGENCE-API-KEY
AZURE_ACC_NAME=YOUR-STORAGE-ACCOUNT-NAME
AZURE_PRIMARY_KEY=YOUR-STORAGE-ACCOUNT-KEY
STORAGE_ACCOUNT_CONTAINER=nasa-files
SQL_SERVER = YOUR-AZURE-SQL-SERVER.database.windows.net
SQL_DB = nasa-documents
SQL_USERNAME=YOUR-SQL-USER-NAME
SQL_SECRET= YOUR-SQL-USER-PWD
OPENAI_ENDPOINT=https://YOUR-OPEN-AI-RESOURCE-NAME.openai.azure.com/
OPENAI_API_KEY=YOUR-OPEN-AI-API-KEY
OPENAI_EMBEDDING_MODEL=text-embedding-ada-002
TEXT_ANALYTICS_ENDPOINT=https://YOUR-AZURE-LANGUAGE-SERVICE-NAME.cognitiveservices.azure.com/
TEXT_ANALYTICS_KEY=YOUR-AZURE-LANGUAGE-SERVICE-KEY
Important
If you are a Mac user, please follow this to install ODBC for PYODBC
Open the SQLGraphRag Notebook and run it!
Go back to the query editor, create a new query and run the following tests:
declare @systemMessage varchar(max)
declare @text varchar(max)
set @systemMessage = 'Summarize the document content'
set @text = 'Give me a summary in laymans terms, only search for the document with name silkroads.pdf'
execute [dbo].[AskDocumentQuestion] @text,@systemMessage,0
declare @systemMessage varchar(max)
declare @text varchar(max)
set @systemMessage = 'you are a helpful assistant that helps people find information'
set @text = 'What are the main innovations of Nasa Science Mission Directorate?'
execute [dbo].[AskDocumentQuestion] @text,@systemMessage,0
declare @systemMessage varchar(max)
declare @text varchar(max)
set @systemMessage = 'Summarize the document content'
set @text = 'what are the main topics of this database content?'
execute [dbo].[AskDocumentQuestion] @text,@systemMessage,0
Nagivate to the Azure Portal, deploy a the Adventure Works sample database. For more information on how to do this click here
Important
Make sure to configure your database as serverless to save money
Once the database has been deployed, navigate to the query editor, copy and paste the following T-SQL script.
Make sure you update the Open AI parameters in the T-SQL script.
Important
Make sure to update your OPEN AI information before running this, otherwise you will need to run an alter procedure.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: <Author, , Name>
-- Create Date: <Create Date, , >
-- Description: <Description, , >
-- =============================================
Create PROCEDURE [dbo].[SQLNLP] (@question varchar(max), @schema varchar(max))
AS
BEGIN
declare @text nvarchar(max) = @question,
@schema_name nvarchar(max) = @schema;
declare @systemmessage nvarchar(max)
declare @payload2 nvarchar(max)
declare @top int = 20
declare @min_similarity decimal(19,16) = 0.75
declare @retval int, @response nvarchar(max);
declare @payload nvarchar(max);
set @payload = json_object('input': @text);
declare @urlEmbeddings nvarchar(250);
declare @theHeadings nvarchar(250)
declare @urlGPT4 nvarchar(250)
DECLARE @content VARCHAR(MAX);
DECLARE @document_name VARCHAR(255);
DECLARE @chunk_id INT;
declare @previous_summary varchar(max)
set @previous_summary = ''
set @urlGPT4 = 'https://YOUR-OPEN-AI-SERVICE-NAME.openai.azure.com/openai/deployments/YOUR-COMPLETIONS-DEPLOYMENT-NAME/chat/completions?api-version=2023-07-01-preview'
set @theHeadings = '{"Content-Type":"application/json","api-key":"YOUR-OPEN-AI-API-KEY"}'
--======================= Fetch the database schema
DECLARE @cols AS NVARCHAR(MAX),
@table_name as nvarchar(max),
@table_columns varchar(max),
@query AS NVARCHAR(MAX);
-- Declare the cursor
DECLARE TableCursor CURSOR FOR
SELECT distinct
C.TABLE_NAME
FROM
INFORMATION_SCHEMA.COLUMNS C
JOIN
INFORMATION_SCHEMA.TABLES T
ON
C.TABLE_NAME = T.TABLE_NAME
AND C.TABLE_SCHEMA = T.TABLE_SCHEMA
WHERE
T.TABLE_TYPE = 'BASE TABLE'
AND T.TABLE_SCHEMA = @schema_name;
drop table if exists #tables;
create table #tables ( theTable nvarchar(max));
-- Open the cursor
OPEN TableCursor;
-- Fetch the first row
FETCH NEXT FROM TableCursor INTO @table_name;
-- Loop through the rows
WHILE @@FETCH_STATUS = 0
BEGIN
-- Process each row
--============================================================================================
-- Generate the column list with data types
SET @cols = STUFF((SELECT DISTINCT ', ' + QUOTENAME(COLUMN_NAME) + ' - ' + DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @schema_name AND TABLE_NAME = @table_name
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, '');
set @table_columns = 'TableName: ' + QUOTENAME(@schema_name) + '.' + QUOTENAME(@table_name) + ' Columns: ' + + @cols
--select @table_columns
SET @query = 'insert into #tables (theTable) values (''' + @table_columns + ''')'
--select @query
-- Execute the query
EXEC sp_executesql @query;
--====================================================================================================
-- Fetch the next row
FETCH NEXT FROM TableCursor INTO @table_name;
END
-- Close the cursor
CLOSE TableCursor;
-- Deallocate the cursor
DEALLOCATE TableCursor;
declare @finalSchema varchar(max)
SELECT @finalSchema = STRING_AGG(theTable, ', ') from #tables
--select @finalSchema
--============ Now let's pass that for a question
set @systemMessage = ''' You are an agent designed to return SQL statements with schema detail in <<data_sources>>.
Given an input question, create a syntactically correct ODBC Driver 17 for SQL Server query to run.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.
You MUST double check your query. User step by step thought process
DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
Remember to format SQL query as in ODBC Driver 17 for SQL Server in your response.
remove any invalid characters and double check that the query will perform correctly, just return SQL statements, skip pleasentries
return syntactically correct ODBC Driver 17 for SQL Server query ready to run, all fields or agregations that use fields with money type, return them as money type
return SQL statements only, do not include thought process, do not query sys objects
<<data_sources>>
''' + @finalSchema + ' ## End <<data_sources>> ##'
set @payload2 =
json_object(
'messages': json_array(
json_object(
'role':'system',
'content':'
' + @systemMessage + '
'
),
json_object(
'role':'user',
'content': + @text
)
),
'max_tokens': 4096,
'temperature': 0.5,
'frequency_penalty': 0,
'presence_penalty': 0,
'top_p': 0.95,
'stop': null
);
--select @payload2
exec @retval = sp_invoke_external_rest_endpoint
@url = @urlGPT4,
@headers = @theHeadings,
@method = 'POST',
@timeout = 120,
@payload = @payload2,
@response = @response output;
--select @response
drop table if exists #j;
select * into #j from openjson(@response, '$.result.choices') c;
--select * from #j
declare @value varchar(max)
select @value = [value] from #j
select @query = [value] from openjson(@value, '$.message') where [key] = 'content'
SELECT @query = REPLACE(@query, '`', '')
SELECT @query = REPLACE(@query, 'sql', '')
-- select @query
EXEC sp_executesql @query;
END
Create a new query and test
declare @text nvarchar(max)
declare @schema nvarchar(250) = 'SalesLT'
set @text = 'Is that true that top 20% customers generate 80% revenue ? What is their percentage of revenue contribution?'
execute [dbo].[SQLNLP] @text, @schema
declare @text nvarchar(max)
declare @schema nvarchar(250) = 'SalesLT'
set @text = 'Is that true that top 20% customers generate 80% revenue ? please list those customers, give me details on their orders, including items they purchased'
execute [dbo].[SQLNLP] @text, @schema
declare @text nvarchar(max)
declare @schema nvarchar(250) = 'SalesLT'
set @text = 'Which products have most seasonality in sales quantity, add the month they are purchased the least'
execute [dbo].[SQLNLP] @text, @schema