Skip to content

MarchingBug/SQL-RAG

Repository files navigation

Intelligent Applications with Azure SQL

Azure SQL can be used to build intelligent applications. To see this solution in action navigate to this website

SQL Intelligent applications

Overview - Asking questions, different kinds of questions

Combine the power of Azure SQL and OpenAI and use familiar tools to create enhanced intelligent database solutions.

beyond RDBMS

Part I - Ask questions on your documents

In this code, using a python Jupyter Notebook we ingest a large number of documents from a storage account (or you can use a SharePoint Site, code is there), save the chunked information into an Azure SQL database using a stored procedure.

data ingestion

The stored procedure saves data to the documents table, saves the embeddings and creates similarity vector table, as well as saving key phrases into a graph table for searching.

documents

You can use the AskDocumentQuestion Stored procedure that takes system message and question as parameters to answer question about your data.

ask

ask ask ask

Part II - SQL + NLP - get data insights

Using vanilla AdventureWorks Database, you can ask insightful questions about your data, right inside your SQL server

nlp

Key Concepts

To implement this solution the following components were used

Assets

This repository containes the following assets and code:

  • Azure SQL Database bacpac file
  • Requirements.txt
  • SQLGraphRag Jupiter Notebook with code needed to ingest documents into your database
  • Sample documents

Services used for this solution

Listed below are the services needed for this solution, if you don't have an azure subscription, you can create a free one. If you already have an subscription, please make sure that your administration has granted access to the services below:

Programming Tools needed:

Expected time to completion

This project should take about 1 hour to complete

Part I - Ask questions on documents

Part I - Setup Steps

Important

Before you begin, clone this repository to your local drive

  1. Azure account - login or create one
  2. Create a resource group
  3. Create a Storage Account
  4. Create the Azure SQL Database
  5. Create OpenAI Account and Deploy Models
  6. Create Azure Document Intelligence Service
  7. Create Azure AI Language Service
  8. Upload documents to storage account
  9. Configure Stored Procedure

Ingestion and SQL Configuration

  1. Set up enviromental variables
  2. Run Notebook to ingest
  3. Ask Question

Task 1 - Azure Account

First, you will need an Azure account. If you don't already have one, you can start a free trial of Azure here.

Log into the Azure Portal using your credentials

Task 2 - Create a resource group

If you are new to Azure, a resource group is a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group, click here to learn how to create a group

Write the name of your resource group on a text file, we will need it later

Task 3 - Create a Storage Account

If you don't have a storage account yet, please create one, to learn more about creating a storage account, click here.

Create a container name, you can use nasa-documents or create your own name

Note the storage account name and access key and the container name in your text file.

Task 4 - Create the SQL Server Database

Upload the file nasa-documents.bacpac located under the folder data-files to a storage account in your subscription.

Import the database package to a serverless database, for more information on how to do this click here.

[!IMPORTANT] Configure your database as a Hyperscale - Serverless database

Task 5 - Create OpenAI Account and Deploy Models

If you do not have an OpenAI account, create one, you have your Azure OpenAI service, make suru have or deploy two models

1. Completions model, we used `gtp-4o` for this demo,if you can, please use this model.
2. Embeddings model, use text-embedding-ada-002 for this demo.    

If the models are alredy deployed, use those, if not, for more information on how to deploy this service and models click here

Note the Azure OpenAI service endpoint, API key and the model name on your text file

Task 6 - Create Azure Document Intelligence Service

Document intelligence will be used to chunk documents using top notch technologies to read your documents.

If you do not have a document intelligence service account, create one, for more information click here

Do not use the free SKU.

Note the document intelligence service endpoint, API key on your text file

Task 7 - Create Azure AI Language Service

Azure AI Language Service will be used to extract key phrases from each document chunk, you can also use Azure AI Language Service to extract sentiment analysis, Entities and more.

If you do not have a Azure AI Language Service , create one:

  1. In the Azure Portal, search for “Azure AI services” and select “Create” under Language Service1.
  2. Fill in the required details such as the resource name, subscription, resource group, and location. Choose the pricing tier that suits your needs (you can start with the Free tier).

Note the Azure AI Language service endpoint, API key on your text file

Task 8 - Upload documents to storage account

Download the file nasa-documents.zip to your local computer, extract the files and load them to the container you created on step 3

Task 9 - Configure Stored Procedure

Log into the Azure Portal, navigate to your sql database and open the query editor for the nasa-documents database (or you can use SQL Server Management Studio).

Query Editor

Once logged in expand the stored procedure sections, click on the elipsis and select View Definition

Query Editor 1

scroll down to line 33, you will need to update your OpenAI configuration there

Query Editor 2

Once you make the changes, click on run.


Data Ingestion

Task 1 - Set up enviromental variables

Using VS Studio, open the project folder.

Provide settings for Open AI and Database.You can either create a file named secrets.env file in the root of this folder and use VS Code app's UI later (easier).

    AFR_ENDPOINT=https://YOUR-DOCUMENT-INTELLIGENCE-SERIVCE-NAME.cognitiveservices.azure.com/
    AFR_API_KEY=YOUR-DOCUMENT-INTELLIGENCE-API-KEY
    AZURE_ACC_NAME=YOUR-STORAGE-ACCOUNT-NAME
    AZURE_PRIMARY_KEY=YOUR-STORAGE-ACCOUNT-KEY
    STORAGE_ACCOUNT_CONTAINER=nasa-files 
    SQL_SERVER = YOUR-AZURE-SQL-SERVER.database.windows.net
    SQL_DB = nasa-documents
    SQL_USERNAME=YOUR-SQL-USER-NAME
    SQL_SECRET= YOUR-SQL-USER-PWD
    OPENAI_ENDPOINT=https://YOUR-OPEN-AI-RESOURCE-NAME.openai.azure.com/
    OPENAI_API_KEY=YOUR-OPEN-AI-API-KEY
    OPENAI_EMBEDDING_MODEL=text-embedding-ada-002
    TEXT_ANALYTICS_ENDPOINT=https://YOUR-AZURE-LANGUAGE-SERVICE-NAME.cognitiveservices.azure.com/
    TEXT_ANALYTICS_KEY=YOUR-AZURE-LANGUAGE-SERVICE-KEY

Important

If you are a Mac user, please follow this to install ODBC for PYODBC

Task 2 - Run notebook to ingest

Open the SQLGraphRag Notebook and run it!

Task 3 - Ask a question

Go back to the query editor, create a new query and run the following tests:

Test 1

declare @systemMessage varchar(max)
declare @text varchar(max)

set @systemMessage = 'Summarize the document content'
set @text  = 'Give me a summary in laymans terms, only search for the document with name silkroads.pdf'
execute [dbo].[AskDocumentQuestion] @text,@systemMessage,0

Test 2

declare @systemMessage varchar(max)
declare @text varchar(max)

set @systemMessage = 'you are a helpful assistant that helps people find information'
set @text  = 'What are the main innovations of Nasa Science Mission Directorate?'
execute [dbo].[AskDocumentQuestion] @text,@systemMessage,0

Test 3

declare @systemMessage varchar(max)
declare @text varchar(max)

set @systemMessage = 'Summarize the document content'
set @text  = 'what are the main topics of this database content?'
execute [dbo].[AskDocumentQuestion] @text,@systemMessage,0

Part II - Get SQL Insights with Natural Language

Part II - Setup Steps

  1. Deploy Adventure Works Database
  2. Add Stored Procedure
  3. Ask Questions NLP

Task 1 - Deploy Adventure Works Database

Nagivate to the Azure Portal, deploy a the Adventure Works sample database. For more information on how to do this click here

Important

Make sure to configure your database as serverless to save money

Task 2 - Add Stored Procedure

Once the database has been deployed, navigate to the query editor, copy and paste the following T-SQL script.

Make sure you update the Open AI parameters in the T-SQL script.

Important

Make sure to update your OPEN AI information before running this, otherwise you will need to run an alter procedure.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:      <Author, , Name>
-- Create Date: <Create Date, , >
-- Description: <Description, , >
-- =============================================
Create PROCEDURE [dbo].[SQLNLP] (@question varchar(max), @schema varchar(max))
AS
BEGIN

   declare @text nvarchar(max) =  @question,
           @schema_name nvarchar(max) = @schema;
   		
   

   declare @systemmessage nvarchar(max)
   
   declare @payload2  nvarchar(max)
   declare @top int = 20
   declare @min_similarity decimal(19,16) = 0.75

   declare @retval int, @response nvarchar(max);
   declare @payload nvarchar(max);
   set @payload = json_object('input': @text);

   declare @urlEmbeddings nvarchar(250);
   declare @theHeadings nvarchar(250)
   declare @urlGPT4 nvarchar(250)
   DECLARE @content VARCHAR(MAX);
   DECLARE @document_name VARCHAR(255);
   DECLARE @chunk_id INT;
   declare @previous_summary varchar(max)

   set @previous_summary = ''


   set @urlGPT4 = 'https://YOUR-OPEN-AI-SERVICE-NAME.openai.azure.com/openai/deployments/YOUR-COMPLETIONS-DEPLOYMENT-NAME/chat/completions?api-version=2023-07-01-preview'
   set @theHeadings  = '{"Content-Type":"application/json","api-key":"YOUR-OPEN-AI-API-KEY"}'


    --=======================  Fetch the database schema

    DECLARE @cols AS NVARCHAR(MAX),
   		 @table_name as nvarchar(max),
   		 @table_columns varchar(max),			
   		 @query AS NVARCHAR(MAX);

   

   -- Declare the cursor
   DECLARE TableCursor CURSOR FOR
   SELECT distinct 
   	C.TABLE_NAME       
   FROM 
   	INFORMATION_SCHEMA.COLUMNS C  
   JOIN 
   	INFORMATION_SCHEMA.TABLES T 
   ON 
   	C.TABLE_NAME = T.TABLE_NAME 
   	AND C.TABLE_SCHEMA = T.TABLE_SCHEMA  
   WHERE 
   	T.TABLE_TYPE = 'BASE TABLE' 
   	AND T.TABLE_SCHEMA = @schema_name;


   drop table if exists #tables;
   create table #tables ( theTable nvarchar(max));

   -- Open the cursor
   OPEN TableCursor;

   -- Fetch the first row
   FETCH NEXT FROM TableCursor INTO @table_name;

   -- Loop through the rows
   WHILE @@FETCH_STATUS = 0
   BEGIN
   	-- Process each row
   	--============================================================================================
   
   	-- Generate the column list with data types
   	SET @cols = STUFF((SELECT DISTINCT ', ' + QUOTENAME(COLUMN_NAME) + ' - ' + DATA_TYPE
   				   FROM INFORMATION_SCHEMA.COLUMNS
   				   WHERE TABLE_SCHEMA = @schema_name AND TABLE_NAME = @table_name
   				   FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, '');




   	set @table_columns = 'TableName: ' + QUOTENAME(@schema_name) + '.' + QUOTENAME(@table_name) + ' Columns: ' + + @cols

   	--select @table_columns

   	SET @query = 'insert into #tables (theTable) values (''' + @table_columns + ''')'       
   	--select @query
   	-- Execute the query
   	EXEC sp_executesql @query;
   

   	--====================================================================================================

   	-- Fetch the next row
   	FETCH NEXT FROM TableCursor INTO @table_name;
   END

   -- Close the cursor
   CLOSE TableCursor;

   -- Deallocate the cursor
   DEALLOCATE TableCursor;

   declare @finalSchema varchar(max)
   SELECT @finalSchema = STRING_AGG(theTable, ', ')  from #tables
   --select @finalSchema

   --============ Now let's pass that for a question

   set @systemMessage = ''' You are an agent designed to return SQL statements with schema detail in <<data_sources>>.
   		Given an input question, create a syntactically correct ODBC Driver 17 for SQL Server query to run.
   		You can order the results by a relevant column to return the most interesting examples in the database.
   		Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.
   		You MUST double check your query. User step by step thought process
   		DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
   		Remember to format SQL query as in ODBC Driver 17 for SQL Server  in your response.
   		remove any invalid characters and double check that the query will perform correctly, just return SQL statements, skip pleasentries
   		return syntactically correct ODBC Driver 17 for SQL Server query ready to run, all fields or agregations that use fields with money type, return them as money type
   		return SQL statements only, do not include thought process, do not query sys objects
   	
   		<<data_sources>>	
   		''' +  @finalSchema + ' ## End <<data_sources>> ##'


   set @payload2 = 
   					json_object(
   						'messages': json_array(
   								json_object(
   									'role':'system',
   									'content':'
   										' + @systemMessage + '
   									'
   								),								
   								json_object(
   									'role':'user',
   									'content': + @text
   								)
   						),
   						'max_tokens': 4096,
   						'temperature': 0.5,
   						'frequency_penalty': 0,
   						'presence_penalty': 0,
   						'top_p': 0.95,
   						'stop': null
   					);


   --select @payload2

   exec @retval = sp_invoke_external_rest_endpoint
   	@url =  @urlGPT4,
   	@headers = @theHeadings,
   	@method = 'POST',   
   	@timeout = 120,
   	@payload = @payload2,
   	@response = @response output;

   --select @response

   drop table if exists #j;
   select * into #j from openjson(@response, '$.result.choices') c;
   --select * from #j

   declare @value varchar(max)

   select @value = [value] from #j


   select @query = [value] from openjson(@value, '$.message') where [key] = 'content'


   SELECT @query = REPLACE(@query, '`', '')
   SELECT @query = REPLACE(@query, 'sql', '')


   -- select @query

   EXEC sp_executesql @query;
END

Task 3 - Ask Questions NLP

Create a new query and test

declare @text nvarchar(max)
declare @schema nvarchar(250) = 'SalesLT'
set @text = 'Is that true that top 20% customers generate 80% revenue ? What is their percentage of revenue contribution?'
execute [dbo].[SQLNLP] @text, @schema
declare @text nvarchar(max)
declare @schema nvarchar(250) = 'SalesLT'
set @text = 'Is that true that top 20% customers generate 80% revenue ? please list those customers, give me details on their orders, including items they purchased'
execute [dbo].[SQLNLP] @text, @schema
declare @text nvarchar(max)
declare @schema nvarchar(250) = 'SalesLT'
set @text = 'Which products have most seasonality in sales quantity, add the month they are purchased the least'
execute [dbo].[SQLNLP] @text, @schema

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published