- Set up a Weaviate cluster and get an API key.
- Get an OpenAI API key
- Fill in these keys in .envand copy it into.dev.vars(both are.git-ignored).WEAVIATE_URL=https://zlpipialqvivotmfurmzpw.c0.asia-southeast1.gcp.weaviate.cloud WEAVIATE_API_KEY=... OPENAI_API_KEY=... 
- Run uv run embed.pyto upload embeddings into Weaviate
- Run npm installto install dependencies
- Set up CloudFlare keys using the same keys from .dev.varsnpx wrangler secret put WEAVIATE_URL npx wrangler secret put WEAVIATE_API_KEY npx wrangler secret put OPENAI_API_KEY 
- Run npx wrangler devto test athttps://localhost:8787
- Run npx wrangler deployto deploy to production
The embedding system processes src/*.md and stores them in Weaviate Cloud with vector embeddings generated by OpenAI's text-embedding-3-small model.
embed.py creates a Document collection with the following properties:
- filename: Name of the source file
- filepath: Full path to the source file
- content: Complete file content
- file_size: File size in bytes
- content_hash: SHA256 hash for duplicate detection
- file_extension: File extension (.md)
These are embedded with OpenAI's text-embedding-3-small model (1536 dimensions) from content.
Modified files are replaced. New documents are inserted. Deleted documents are not deleted. #TODO
Now you can query the documents using Weaviate's GraphQL API or Python client. For example:
import os
import weaviate
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.getenv("WEAVIATE_URL"),
    auth_credentials=weaviate.AuthApiKey(os.getenv("WEAVIATE_API_KEY")),
    headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")},
)
collection = client.collections.get("Document")
print(collection.query.hybrid(query="admission process", limit=3))
client.close()A CloudFlare Worker provides semantic document search and AI-powered question answering using Weaviate and OpenAI. Run:
curl http://localhost:8787/answer \
  -H 'Content-Type: application/json' \
  -d '{"q": "How do I register for courses", "ndocs": 3}'The is a text/event-stream subset of the OpenAI chat completion object.
Here are the fields:
data: {"choices": [{"delta": {"tool_calls": { "function": { "name": "document", "arguments": "{\"name\": ..., \"link\": ... }" }}}}]}
data: {"choices": [{"delta": {"tool_calls": { "function": { "name": "document", "arguments": "{\"name\": ..., \"link\": ... }" }}}}]}
data: {"choices": [{"delta": {"content": "..."}}]}
data: {"choices": [{"delta": {"content": "..."}}]}
data: [DONE]
- It begins with choices[0].delta.tool_callshaving one JSON-encodedargumentsfor each document, mentioning{name, link}.
- It continues with choices[0].delta.contentthat has the streaming answer text
Add this code to the IITM BS website:
<script src="https://iitm-bs-chatbot.sanand.workers.dev/chatbot.js" type="module"></script>See a live demo at https://iitm-bs-chatbot.sanand.workers.dev/.
chatbot.js script will automatically create the chatbot button, the chat app in an iframe, and inject all the necessary CSS for styling.