intel · sleepingcat4 · Jul 14, 2024 · Jul 14, 2024
diff --git a/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/README.md b/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/README.md
@@ -1,3 +1,9 @@
+**A Beginner friendly tutorial**
+=======
+[text_embedding_made_simple_xpu.ipynb](https://github.com/sleepingcat4/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/text_embedding_made_simple_xpu.ipynb)
+
+In this notebook, we have showed how we can quantise one of the biggest and most recent embedding model (BAAI/bge-m3) present on huggingface using Intel XPU and generate embeddings on Jupyter Notebook.
+
 Step-by-Step
 =======
 This document describes the end-to-end workflow for Huggingface model [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5), [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) and [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) with LLM Runtime backend.

diff --git a/...ggingface/pytorch/text-embedding/deployment/mteb/bge/text_embedding_made_simple_xpu.ipynb b/...ggingface/pytorch/text-embedding/deployment/mteb/bge/text_embedding_made_simple_xpu.ipynb
@@ -0,0 +1,170 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "In this notebook, I will describe how we can quantise huggingface models on Intel GPUs (XPU). For demonstration, we're goingto embed a sentence using ```BAAI/bge-m3``` model one of the largest mother embedding model in existence."
+      ],
+      "metadata": {
+        "id": "CzpnbtoNej6N"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#### Installation\n",
+        "\n",
+        "Please install below libraries\n",
+        "\n",
+        "https://github.com/intel/intel-extension-for-transformers\n",
+        "\n",
+        "```pip install intel-extension-for-pytorch```"
+      ],
+      "metadata": {
+        "id": "ZHbGyOWtfHdL"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We're going to import both the standard transformers library and intel specific transformers library"
+      ],
+      "metadata": {
+        "id": "JnslJkenfULW"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "SOSdM2_gefOX"
+      },
+      "outputs": [],
+      "source": [
+        "from transformers import AutoTokenizer\n",
+        "from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM\n",
+        "import torch\n",
+        "import intel_extension_for_pytorch as ipex"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Model name"
+      ],
+      "metadata": {
+        "id": "0Uome2bMfebE"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Model name or path\n",
+        "model_name = \"BAAI/bge-m3\""
+      ],
+      "metadata": {
+        "id": "dGelWLE9ffcx"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now, we'll load tokenizer and map everything on the Intel XPU (GPU)"
+      ],
+      "metadata": {
+        "id": "xI9m0SxNfj7w"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "device_map = \"xpu\"\n",
+        "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
+        "input_sentence = \"what's the capital of England?\"\n",
+        "inputs = tokenizer(input_sentence, return_tensors=\"pt\")\n",
+        "inputs = {key: tensor.to(\"xpu\") for key, tensor in inputs.items()}"
+      ],
+      "metadata": {
+        "id": "b7lZvHBhfoSC"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Loading the model on Intel XPU"
+      ],
+      "metadata": {
+        "id": "lWLpFMvzfq2m"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True, device_map=\"xpu\", trust_remote_code=True, use_llm_runtime=False)\n",
+        "model = ipex.optimize_transformers(model, inplace=True, dtype=torch.float16, quantization_config=True, device=device_map)"
+      ],
+      "metadata": {
+        "id": "Tl9-HUZRfuHE"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Generating embeddings"
+      ],
+      "metadata": {
+        "id": "ZAnYVZqBfxlB"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "with torch.no_grad():\n",
+        "    outputs = model(**inputs)\n",
+        "    logits = outputs.logits\n",
+        "\n",
+        "embeddings = logits.mean(dim=1)\n",
+        "print(embeddings)"
+      ],
+      "metadata": {
+        "id": "a1DzuiqVfzF3"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "**example output**\n",
+        "\n",
+        "```\n",
+        "tensor([[ 4.3945e+00, -2.6588e-03,  9.7559e-01,  ...,  5.6680e+00,\n",
+        "          1.0303e+00,  2.5488e+00]], device='xpu:0', dtype=torch.float16)\n",
+        "```"
+      ],
+      "metadata": {
+        "id": "TpYelWKaf5LZ"
+      }
+    }
+  ]
+}