Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

A beginner friendly quantize and text embeddings tutorial for XPUs #1663

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
**A Beginner friendly tutorial**
=======
[text_embedding_made_simple_xpu.ipynb](https://github.com/sleepingcat4/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/text_embedding_made_simple_xpu.ipynb)

In this notebook, we have showed how we can quantise one of the biggest and most recent embedding model (BAAI/bge-m3) present on huggingface using Intel XPU and generate embeddings on Jupyter Notebook.

Step-by-Step
=======
This document describes the end-to-end workflow for Huggingface model [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5), [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) and [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) with LLM Runtime backend.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"In this notebook, I will describe how we can quantise huggingface models on Intel GPUs (XPU). For demonstration, we're goingto embed a sentence using ```BAAI/bge-m3``` model one of the largest mother embedding model in existence."
],
"metadata": {
"id": "CzpnbtoNej6N"
}
},
{
"cell_type": "markdown",
"source": [
"#### Installation\n",
"\n",
"Please install below libraries\n",
"\n",
"https://github.com/intel/intel-extension-for-transformers\n",
"\n",
"```pip install intel-extension-for-pytorch```"
],
"metadata": {
"id": "ZHbGyOWtfHdL"
}
},
{
"cell_type": "markdown",
"source": [
"We're going to import both the standard transformers library and intel specific transformers library"
],
"metadata": {
"id": "JnslJkenfULW"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "SOSdM2_gefOX"
},
"outputs": [],
"source": [
"from transformers import AutoTokenizer\n",
"from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM\n",
"import torch\n",
"import intel_extension_for_pytorch as ipex"
]
},
{
"cell_type": "markdown",
"source": [
"Model name"
],
"metadata": {
"id": "0Uome2bMfebE"
}
},
{
"cell_type": "code",
"source": [
"# Model name or path\n",
"model_name = \"BAAI/bge-m3\""
],
"metadata": {
"id": "dGelWLE9ffcx"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now, we'll load tokenizer and map everything on the Intel XPU (GPU)"
],
"metadata": {
"id": "xI9m0SxNfj7w"
}
},
{
"cell_type": "code",
"source": [
"device_map = \"xpu\"\n",
"tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
"input_sentence = \"what's the capital of England?\"\n",
"inputs = tokenizer(input_sentence, return_tensors=\"pt\")\n",
"inputs = {key: tensor.to(\"xpu\") for key, tensor in inputs.items()}"
],
"metadata": {
"id": "b7lZvHBhfoSC"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Loading the model on Intel XPU"
],
"metadata": {
"id": "lWLpFMvzfq2m"
}
},
{
"cell_type": "code",
"source": [
"model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True, device_map=\"xpu\", trust_remote_code=True, use_llm_runtime=False)\n",
"model = ipex.optimize_transformers(model, inplace=True, dtype=torch.float16, quantization_config=True, device=device_map)"
],
"metadata": {
"id": "Tl9-HUZRfuHE"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Generating embeddings"
],
"metadata": {
"id": "ZAnYVZqBfxlB"
}
},
{
"cell_type": "code",
"source": [
"with torch.no_grad():\n",
" outputs = model(**inputs)\n",
" logits = outputs.logits\n",
"\n",
"embeddings = logits.mean(dim=1)\n",
"print(embeddings)"
],
"metadata": {
"id": "a1DzuiqVfzF3"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"**example output**\n",
"\n",
"```\n",
"tensor([[ 4.3945e+00, -2.6588e-03, 9.7559e-01, ..., 5.6680e+00,\n",
" 1.0303e+00, 2.5488e+00]], device='xpu:0', dtype=torch.float16)\n",
"```"
],
"metadata": {
"id": "TpYelWKaf5LZ"
}
}
]
}