At the start of this year, we released a new set of T-SQL AI features for embedding your relational data for AI applications. Today, we have brought those features to Azure SQL and SQL database in Microsoft Fabric.
This post will help you get started using the new AI functions of Azure SQL.
Prerequisites
- An Azure SQL database or SQL database in Microsoft Fabric created with the adventureworks sample data
- SSMS or VS Code with the MSSQL Extension
- An Azure OpenAI embeddings endpoint with the following 2 key pieces of information:
- The endpoint URL
- The auth key for the endpoint
Set up your environment
The following section guides you through setting up the environment and installing the necessary software and utilities.
Set up the database
The following section guides you through using the embeddings model to create vector arrays on relation data and use the new vector similarity search functionality in Azure SQL and SQL database in Microsoft Fabric.
Create database scoped credentials
Use the following sample code to create a set of database scoped credentials for calling our Azure OpenAI Endpoint and providing the key in the header:
Note: Your Endpoint URLs and Key will be different that these in the blog post
-- Create a master key for the database if not exists(select * from sys.symmetric_keys where [name] = '##MS_DatabaseMasterKey##') begin create master key encryption by password = N'V3RYStr0NGP@ssw0rd!'; end go -- Create the database scoped credential for Azure AI Content Understanding if not exists(select * from sys.database_scoped_credentials where [name] = 'https://azure.cognitiveservices.azure.com/') begin create database scoped credential [https://azure.cognitiveservices.azure.com/] with identity = 'HTTPEndpointHeaders', secret = '{"api-key":"YOUR_AZURE_OPEN_AI_KEY"}'; end go
Create the EXTERNAL MODEL in the database
1. Using SSMS or VS Code, login to the database.
2. Open a new query sheet
3. Next, run the following SQL to create an EXTERNAL MODEL that points to an Azure OpenAI embedding model (here ill be using text-embedding-3-small):
Note: Your Endpoint URLs will be different that these in the blog post
CREATE EXTERNAL MODEL text3small WITH ( LOCATION = 'https://azure.cognitiveservices.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15', API_FORMAT = 'Azure OpenAI', MODEL_TYPE = EMBEDDINGS, MODEL = 'text-embedding-3-small', CREDENTIAL = [https://azure.cognitiveservices.azure.com/] );
Test the EXTERNAL MODEL
To test the embeddings endpoint, run the following SQL:
select AI_GENERATE_EMBEDDINGS(N'test text' USE MODEL text3small);
You should see a JSON vector array returned similar to the following:
[0.1529204398393631,0.4368368685245514,-3.6136839389801025,-0.7697131633758545…
Embed Product Data
This next section of the tutorial will alter the Adventure Works product table to add a new vector data type column.
1. Run the following SQL to add the columns to the Product table:
ALTER TABLE [SalesLT].[Product] ADD embeddings VECTOR (768), chunk NVARCHAR (2000);
2. Next, we are going to use the EXTERNAL MODEL and AI_GENERATE_EMBEDDINGS to create embeddings for text we supply as an input.
Run the following code to create the embeddings:
-- create the embeddings SET NOCOUNT ON; DROP TABLE IF EXISTS #MYTEMP; DECLARE @ProductID int DECLARE @text NVARCHAR (MAX); SELECT * INTO #MYTEMP FROM [SalesLT].Product WHERE embeddings IS NULL; SELECT @ProductID = ProductID FROM #MYTEMP; SELECT TOP(1) @ProductID = ProductID FROM #MYTEMP; WHILE @@ROWCOUNT <> 0 BEGIN SET @text = ( SELECT p.Name + ' ' + ISNULL(p.Color, 'No Color') + ' ' + c.Name + ' ' + m.Name + ' ' + ISNULL(d.Description, '') FROM [SalesLT].[ProductCategory] c, [SalesLT].[ProductModel] m, [SalesLT].[Product] p LEFT OUTER JOIN [SalesLT].[vProductAndDescription] d ON p.ProductID = d.ProductID AND d.Culture = 'en' WHERE p.ProductCategoryID = c.ProductCategoryID AND p.ProductModelID = m.ProductModelID AND p.ProductID = @ProductID ); UPDATE [SalesLT].[Product] SET [embeddings] = AI_GENERATE_EMBEDDINGS(@text USE MODEL text3small), [chunk] = @text WHERE ProductID = @ProductID; DELETE FROM #MYTEMP WHERE ProductID = @ProductID; SELECT TOP(1) @ProductID = ProductID FROM #MYTEMP; END
2. Use the following query to see if any embeddings were missed:
SELECT * FROM SalesLT.Product WHERE embeddings IS NULL;
3. And use this query to see a sample of the new columns and the data within:
SELECT TOP 10 chunk, embeddings FROM SalesLT.Product;
Use VECTOR_DISTANCE
Vector similarity searching is a technique used to find and retrieve data points that are similar to a given query, based on their vector representations. The similarity between two vectors is measured using a distance metric, such as cosine similarity or Euclidean distance. These metrics quantify the similarity between two vectors by calculating the angle between them or the distance between their coordinates in the vector space.
Vector similarity searching has numerous applications, such as recommendation systems, search engines, image and video retrieval, and natural language processing tasks. It allows for efficient and accurate retrieval of similar items, enabling users to find relevant information or discover related items quickly and effectively.
This section of the tutorial will be using the new function VECTOR_DISTANCE.
VECTOR_DISTANCE
Uses K-Nearest Neighbors or KNN
Use the following SQL to run similarity searches using VECTOR_DISTANCE.
declare @search_text nvarchar(max) = 'I am looking for a red bike and I dont want to spend a lot'
declare @search_vector vector(768) = AI_GENERATE_EMBEDDINGS(@search_text USE MODEL text3small);
SELECT TOP(4)
p.ProductID, p.Name , p.chunk,
vector_distance('cosine', @search_vector, p.embeddings) AS distance
FROM [SalesLT].[Product] p
ORDER BY distance;
declare @search_text nvarchar(max) = 'I am looking for a safe helmet that does not weigh much'
declare @search_vector vector(768) = AI_GENERATE_EMBEDDINGS(@search_text USE MODEL text3small);
SELECT TOP(4)
p.ProductID, p.Name , p.chunk,
vector_distance('cosine', @search_vector, p.embeddings) AS distance
FROM [SalesLT].[Product] p
ORDER BY distance;
declare @search_text nvarchar(max) = 'Do you sell any padded seats that are good on trails?'
declare @search_vector vector(768) = AI_GENERATE_EMBEDDINGS(@search_text USE MODEL text3small);
SELECT TOP(4)
p.ProductID, p.Name , p.chunk,
vector_distance('cosine', @search_vector, p.embeddings) AS distance
FROM [SalesLT].[Product] p
ORDER BY distance;
Chunk with embeddings
This section uses the `AI_GENERATE_CHUNKS` function with `AI_GENERATE_EMBEDDINGS` to simulate breaking a large section of text into smaller set sized chunks to be embedded.
1. First, create a table to hold the text:
CREATE TABLE textchunk ( text_id INT IDENTITY (1, 1) PRIMARY KEY, text_to_chunk NVARCHAR (MAX) ); GO
2. Next, insert the text into the table:
INSERT INTO textchunk (text_to_chunk)
VALUES ('All day long we seemed to dawdle through a country which was full of beauty of every kind. Sometimes we saw little towns or castles on the top of steep hills such as we see in old missals; sometimes we ran by rivers and streams which seemed from the wide stony margin on each side of them to be subject to great floods.'),
('My Friend, Welcome to the Carpathians. I am anxiously expecting you. Sleep well to-night. At three to-morrow the diligence will start for Bukovina; a place on it is kept for you. At the Borgo Pass my carriage will await you and will bring you to me. I trust that your journey from London has been a happy one, and that you will enjoy your stay in my beautiful land. Your friend, DRACULA');
GO
3. Finally, create chunks of text to be embedded using both functions:
SELECT c.*, AI_GENERATE_EMBEDDINGS(c.chunk USE MODEL text3small) FROM textchunk t CROSS APPLY AI_GENERATE_CHUNKS(source = text_to_chunk, chunk_type = N'FIXED', chunk_size = 50, overlap = 10) c
The post New T-SQL AI Features are now in Public Preview for Azure SQL and SQL database in Microsoft Fabric appeared first on Azure SQL Devs’ Corner.




