Skip to content

Add support for free Azure Search endpoint and GitHub models #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,8 @@ AZURE_AI_SEARCH_METADATA_FIELD="metadata"
AZURE_AI_SEARCH_EMBEDDING_DIMENSIONALITY="1536"

# Optional: Set the log level for the Azure SDKs.
AZURE_LOG_LEVEL=info
AZURE_LOG_LEVEL=info

# For local development, you must provide personal access token for using github models and azure search.
# Make sure to keep these keys secret and never expose them in public repositories.
GITHUB_TOKEN=<your-personal-access-token>
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ This template, the application code and configuration it contains, has been buil
- [Deploying again](#deploying-again)
- [Running the development server](#running-the-development-server)
- [Using Docker (optional)](#using-docker-optional)
- [Running application at no cost](#running-application-at-no-cost)
- [Using the app](#using-the-app)
- [Clean up](#clean-up)
- [Guidance](#guidance)
Expand Down Expand Up @@ -81,6 +82,8 @@ However, you can try the [Azure pricing calculator](https://azure.com/e/a87a169b

To reduce costs, you can switch to free SKUs for various services, but those SKUs have limitations.

To try out the example at no cost refer [Running application at no cost](#running-application-at-no-cost).

To avoid unnecessary costs, remember to take down your app if it's no longer in use,
either by deleting the resource group in the Portal or running `azd down`.

Expand Down Expand Up @@ -227,6 +230,37 @@ npm run dev

Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.


## Running application at no cost
This approach uses free GitHub Models endpoint to access GPT models and embedding, and free Azure AI Search endpoint for data indexing and retrieval.

First, install the project dependencies:

```
npm install
```

Create a GitHub personal access token (refer [Managing your personal access tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) for creating tokens). In the root of the project, create a `.env` file and provide values for below environment variables(refer `.env.example`):

```
GITHUB_TOKEN=
LLAMAINDEX_STORAGE_CACHE_DIR=
```

Next, generate the embeddings of the documents in the [./data](./data) directory:

```
npm run generate
```

Finally, run the development server:

```
npm run dev
```

Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.

## Using the app

- In Azure: navigate to the Azure app deployed by `azd`. The URL is printed out when `azd` completes (as "Endpoint"), or you can find it in the Azure portal.
Expand Down
13 changes: 7 additions & 6 deletions app/api/chat/engine/chat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ import { getDataSource } from "./index";
import { generateFilters } from "./queryFilter";
import { createTools } from "./tools";

const createRetrieverOptions = () => {
return process.env.GITHUB_TOKEN
? { mode: "hybrid" as any }
: { mode: "semantic_hybrid" as any, similarityTopK: 5 };
Comment on lines +13 to +16
Copy link
Preview

Copilot AI Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 'as any' bypasses type checking; consider defining proper types or using existing enums to enforce valid retriever modes.

Suggested change
const createRetrieverOptions = () => {
return process.env.GITHUB_TOKEN
? { mode: "hybrid" as any }
: { mode: "semantic_hybrid" as any, similarityTopK: 5 };
type RetrieverMode = "hybrid" | "semantic_hybrid";
const createRetrieverOptions = (): { mode: RetrieverMode; similarityTopK?: number } => {
return process.env.GITHUB_TOKEN
? { mode: "hybrid" }
: { mode: "semantic_hybrid", similarityTopK: 5 };

Copilot uses AI. Check for mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding explicit type for VectorStoreQueryMode results in the following import error:
Attempted import error: 'VectorStoreQueryMode' is not exported from 'llamaindex' (imported as 'VectorStoreQueryMode').

Hence kept the type as any. May be in future version when llamaindex version is upgraded then the explicit type can be used.

};

export async function createChatEngine(documentIds?: string[], params?: any) {
const tools: BaseToolWithCall[] = [];

Expand All @@ -20,12 +26,7 @@ export async function createChatEngine(documentIds?: string[], params?: any) {
tools.push(
new QueryEngineTool({
queryEngine: index.asQueryEngine({
retriever: index.asRetriever({
// FIXME: Cannot read properties of undefined (reading 'SEMANTIC_HYBRID')
// mode: VectorStoreQueryMode.SEMANTIC_HYBRID,
mode: "semantic_hybrid" as any,
similarityTopK: 5,
}),
retriever: index.asRetriever(createRetrieverOptions()),
preFilters: generateFilters(documentIds || [])
}),
metadata: {
Expand Down
50 changes: 50 additions & 0 deletions app/api/chat/engine/createIndex.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import { MODELS_ENDPOINT } from "./settings";

let requestHeaders: {
"Authorization": string;
"Content-Type": string;
"X-Auth-Provider": string;
};

async function getSearchEndpointDetails() {
try {
const response = await fetch(`${MODELS_ENDPOINT}/freeazuresearch/endpoint/`, {
headers: requestHeaders,
});
const jsonResponse = await response.json();
const searchServiceEndpoint = jsonResponse.endpoint;
const searchIndexName = jsonResponse.indexName;
console.log(`Your Azure AI Search Endpoint: ${searchServiceEndpoint}; Index Name: ${searchIndexName}`);
return { endpoint: searchServiceEndpoint, indexName: searchIndexName };
} catch (error) {
console.error("Error while retrieving search service details", error);
Copy link
Preview

Copilot AI Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of solely logging errors, consider throwing an exception or returning an error result to better surface failures in retrieving search service details.

Suggested change
console.error("Error while retrieving search service details", error);
console.error("Error while retrieving search service details", error);
throw new Error("Failed to retrieve search service details: " + error.message);

Copilot uses AI. Check for mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of an error, searchService object would be null and an exception is already being thrown here.

}
}

async function createUploadSession() {
try {
const response = await fetch(`${MODELS_ENDPOINT}/freeazuresearch/files/createUploadSession`, {
method: "POST",
headers: requestHeaders,
});

const jsonResponse = await response.json();
const uploadSessionId = jsonResponse.id;
console.log(`Created upload session ${uploadSessionId}.`);
return uploadSessionId;
} catch (error) {
console.error("Error while creating upload session", error);
}
}

export async function createSearchService() {
const githubToken = process.env.GITHUB_TOKEN;
requestHeaders = {
"Authorization": `Bearer ${githubToken}`,
"Content-Type": "application/json",
"X-Auth-Provider": "github",
};

await createUploadSession();
return getSearchEndpointDetails();
}
174 changes: 124 additions & 50 deletions app/api/chat/engine/settings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import {
ManagedIdentityCredential,
} from "@azure/identity";
import {
AzureKeyCredential,
KnownAnalyzerNames,
KnownVectorSearchAlgorithmKind,
} from "@azure/search-documents";
Expand All @@ -14,24 +15,134 @@ import { OpenAI, OpenAIEmbedding, Settings } from "llamaindex";
import {
AzureAISearchVectorStore,
IndexManagement,
} from "llamaindex/vector-store/azure/AzureAISearchVectorStore";
} from "llamaindex/vector-store/AzureAISearchVectorStore"

import { createSearchService } from "./createIndex";

const CHUNK_SIZE = 512;
const CHUNK_OVERLAP = 20;
const AZURE_COGNITIVE_SERVICES_SCOPE =
"https://cognitiveservices.azure.com/.default";
export const MODELS_ENDPOINT = "https://models.inference.ai.azure.com";

export const initSettings = async () => {
if (
!process.env.AZURE_OPENAI_CHAT_DEPLOYMENT ||
!process.env.AZURE_OPENAI_EMBEDDING_DEPLOYMENT
) {
async function createAzureAISearchOptions(
azureAiSearchVectorStoreAuth: {
key?: string;
credential?: DefaultAzureCredential | ManagedIdentityCredential;
},
githubToken?: string
) {
const commonOptions = {
serviceApiVersion: "2024-09-01-preview",
indexManagement: IndexManagement.CREATE_IF_NOT_EXISTS,
languageAnalyzer: KnownAnalyzerNames.EnLucene,
vectorAlgorithmType: KnownVectorSearchAlgorithmKind.ExhaustiveKnn,
};

if (!githubToken) {
return {
...azureAiSearchVectorStoreAuth,
endpoint: process.env.AZURE_AI_SEARCH_ENDPOINT,
indexName: process.env.AZURE_AI_SEARCH_INDEX ?? "llamaindex-vector-search",
idFieldKey: process.env.AZURE_AI_SEARCH_ID_FIELD ?? "id",
chunkFieldKey: process.env.AZURE_AI_SEARCH_CHUNK_FIELD ?? "chunk",
embeddingFieldKey: process.env.AZURE_AI_SEARCH_EMBEDDING_FIELD ?? "embedding",
metadataStringFieldKey: process.env.AZURE_AI_SEARCH_METADATA_FIELD ?? "metadata",
docIdFieldKey: process.env.AZURE_AI_SEARCH_DOC_ID_FIELD ?? "doc_id",
embeddingDimensionality: Number(process.env.AZURE_AI_SEARCH_EMBEDDING_DIMENSIONALITY) ?? 1536,
...commonOptions,
};
}

const searchService = await createSearchService();
if (!searchService) {
throw new Error("Failed to retrieve search service details.");
}

return {
credential: new AzureKeyCredential(githubToken),
endpoint: searchService.endpoint,
indexName: searchService.indexName,
idFieldKey: "chunk_id",
chunkFieldKey: "chunk",
embeddingFieldKey: "text_vector",
metadataStringFieldKey: "parent_id",
docIdFieldKey: "chunk_id",
embeddingDimensionality: 1536,
...commonOptions,
};
}

function createEmbeddingParams(
openAiConfig: {
apiKey?: string;
deployment?: string;
model?: string;
azure?: Record<string, string | CallableFunction>;
},
githubToken?: string
) {
if (!githubToken) {
return {
...openAiConfig,
model: process.env.AZURE_OPENAI_EMBEDDING_DEPLOYMENT,
azure: {
...openAiConfig.azure,
deployment: process.env.AZURE_OPENAI_EMBEDDING_DEPLOYMENT,
},
};
}

return {
model: "text-embedding-3-small",
apiKey: githubToken,
additionalSessionOptions: {
baseURL: MODELS_ENDPOINT,
},
};
}

function createOpenAiParams(
openAiConfig: {
apiKey?: string;
deployment?: string;
model?: string;
azure?: Record<string, string | CallableFunction>;
},
githubToken?: string
) {
if (!githubToken) {
return {
...openAiConfig,
model: process.env.AZURE_OPENAI_CHAT_DEPLOYMENT,
};
}

return {
apiKey: githubToken,
additionalSessionOptions: {
baseURL: MODELS_ENDPOINT
},
model: "gpt-4o",
}
}

function validateEnvironmentVariables() {
const areOpenAiChatAndEmbeddingDeploymentConfigured =
process.env.AZURE_OPENAI_CHAT_DEPLOYMENT && process.env.AZURE_OPENAI_EMBEDDING_DEPLOYMENT;
const isGithubTokenConfigured = process.env.GITHUB_TOKEN;
if (!areOpenAiChatAndEmbeddingDeploymentConfigured && !isGithubTokenConfigured) {
throw new Error(
"'AZURE_OPENAI_CHAT_DEPLOYMENT' and 'AZURE_OPENAI_EMBEDDING_DEPLOYMENT' env variables must be set.",
"Environment variables 'AZURE_OPENAI_CHAT_DEPLOYMENT' and 'AZURE_OPENAI_EMBEDDING_DEPLOYMENT' must be set, or a valid GITHUB_TOKEN must be provided."
);
}
}

export const initSettings = async () => {
validateEnvironmentVariables();

let credential;
const githubToken = process.env.GITHUB_TOKEN;
const azureAiSearchVectorStoreAuth: {
key?: string;
credential?: DefaultAzureCredential | ManagedIdentityCredential;
Expand Down Expand Up @@ -72,7 +183,9 @@ export const initSettings = async () => {
);
openAiConfig.azure = {
azureADTokenProvider,
...(process.env.AZURE_OPENAI_CHAT_DEPLOYMENT && {
deployment: process.env.AZURE_OPENAI_CHAT_DEPLOYMENT,
}),
};

azureAiSearchVectorStoreAuth.credential = credential;
Expand All @@ -86,59 +199,20 @@ export const initSettings = async () => {
}

// configure LLM model
Settings.llm = new OpenAI({
...openAiConfig,
model: process.env.AZURE_OPENAI_CHAT_DEPLOYMENT,
});
Settings.llm = new OpenAI(createOpenAiParams(openAiConfig, githubToken));
console.log({ openAiConfig });
Copy link
Preview

Copilot AI Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing or limiting this debug log to avoid exposing sensitive configuration details in production.

Suggested change
console.log({ openAiConfig });
if (process.env.NODE_ENV === 'development') {
console.log("OpenAI configuration initialized.");
}

Copilot uses AI. Check for mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log statement wasn't introduced in this PR, so I’d prefer to keep this PR focused on the current scope. Happy to include the change here if you'd prefer, just let me know.


// configure embedding model
Settings.embedModel = new OpenAIEmbedding({
...openAiConfig,
model: process.env.AZURE_OPENAI_EMBEDDING_DEPLOYMENT,
azure: {
...openAiConfig.azure,
deployment: process.env.AZURE_OPENAI_EMBEDDING_DEPLOYMENT,
}
});
Settings.embedModel = new OpenAIEmbedding(createEmbeddingParams(openAiConfig, githubToken));

Settings.chunkSize = CHUNK_SIZE;
Settings.chunkOverlap = CHUNK_OVERLAP;

// FIXME: find an elegant way to share the same instance across the ingestion and
// generation pipelines

const endpoint = process.env.AZURE_AI_SEARCH_ENDPOINT;
const indexName =
process.env.AZURE_AI_SEARCH_INDEX ?? "llamaindex-vector-search";
const idFieldKey = process.env.AZURE_AI_SEARCH_ID_FIELD ?? "id";
const chunkFieldKey = process.env.AZURE_AI_SEARCH_CHUNK_FIELD ?? "chunk";
const embeddingFieldKey =
process.env.AZURE_AI_SEARCH_EMBEDDING_FIELD ?? "embedding";
const metadataStringFieldKey =
process.env.AZURE_AI_SEARCH_METADATA_FIELD ?? "metadata";
const docIdFieldKey = process.env.AZURE_AI_SEARCH_DOC_ID_FIELD ?? "doc_id";

const azureAiSearchOptions = await createAzureAISearchOptions(azureAiSearchVectorStoreAuth, githubToken);
console.log("Initializing Azure AI Search Vector Store");

(Settings as any).__AzureAISearchVectorStoreInstance__ =
new AzureAISearchVectorStore({
// Use either a key or a credential based on the environment
...azureAiSearchVectorStoreAuth,
endpoint,
indexName,
idFieldKey,
chunkFieldKey,
embeddingFieldKey,
metadataStringFieldKey,
docIdFieldKey,
serviceApiVersion: "2024-09-01-preview",
// FIXME: import IndexManagement.CREATE_IF_NOT_EXISTS from 'llamaindex'
// indexManagement: IndexManagement.CREATE_IF_NOT_EXISTS,
indexManagement: "CreateIfNotExists" as IndexManagement,
embeddingDimensionality: Number(process.env.AZURE_AI_SEARCH_EMBEDDING_DIMENSIONALITY) ?? 1536,
languageAnalyzer: KnownAnalyzerNames.EnLucene,
// store vectors on disk
vectorAlgorithmType: KnownVectorSearchAlgorithmKind.ExhaustiveKnn,
});
(Settings as any).__AzureAISearchVectorStoreInstance__ = new AzureAISearchVectorStore(azureAiSearchOptions);
};
8 changes: 7 additions & 1 deletion app/api/chat/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,22 @@ import { createCallbackManager } from "./llamaindex/streaming/events";
import { generateNextQuestions } from "./llamaindex/streaming/suggestion";

initObservability();
initSettings();

export const runtime = "nodejs";
export const dynamic = "force-dynamic";

async function setVectorStoreInstance() {
if (!(Settings as any).__AzureAISearchVectorStoreInstance__) {
await initSettings();
}
}

export async function POST(request: NextRequest) {
// Init Vercel AI StreamData and timeout
const vercelStreamData = new StreamData();

try {
await setVectorStoreInstance();
const body = await request.json();
const { messages, data }: { messages: Message[]; data?: any } = body;
if (!isValidMessages(messages)) {
Expand Down