diff --git a/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/README.md b/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/README.md index c04d6378c48..912d11877a4 100644 --- a/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/README.md +++ b/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/README.md @@ -1,3 +1,9 @@ +**A Beginner friendly tutorial** +======= +[text_embedding_made_simple_xpu.ipynb](https://github.com/sleepingcat4/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/text_embedding_made_simple_xpu.ipynb) + +In this notebook, we have showed how we can quantise one of the biggest and most recent embedding model (BAAI/bge-m3) present on huggingface using Intel XPU and generate embeddings on Jupyter Notebook. + Step-by-Step ======= This document describes the end-to-end workflow for Huggingface model [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5), [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) and [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) with LLM Runtime backend. diff --git a/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/text_embedding_made_simple_xpu.ipynb b/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/text_embedding_made_simple_xpu.ipynb new file mode 100644 index 00000000000..be313df46da --- /dev/null +++ b/examples/huggingface/pytorch/text-embedding/deployment/mteb/bge/text_embedding_made_simple_xpu.ipynb @@ -0,0 +1,170 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "In this notebook, I will describe how we can quantise huggingface models on Intel GPUs (XPU). For demonstration, we're goingto embed a sentence using ```BAAI/bge-m3``` model one of the largest mother embedding model in existence." + ], + "metadata": { + "id": "CzpnbtoNej6N" + } + }, + { + "cell_type": "markdown", + "source": [ + "#### Installation\n", + "\n", + "Please install below libraries\n", + "\n", + "https://github.com/intel/intel-extension-for-transformers\n", + "\n", + "```pip install intel-extension-for-pytorch```" + ], + "metadata": { + "id": "ZHbGyOWtfHdL" + } + }, + { + "cell_type": "markdown", + "source": [ + "We're going to import both the standard transformers library and intel specific transformers library" + ], + "metadata": { + "id": "JnslJkenfULW" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "SOSdM2_gefOX" + }, + "outputs": [], + "source": [ + "from transformers import AutoTokenizer\n", + "from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM\n", + "import torch\n", + "import intel_extension_for_pytorch as ipex" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Model name" + ], + "metadata": { + "id": "0Uome2bMfebE" + } + }, + { + "cell_type": "code", + "source": [ + "# Model name or path\n", + "model_name = \"BAAI/bge-m3\"" + ], + "metadata": { + "id": "dGelWLE9ffcx" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Now, we'll load tokenizer and map everything on the Intel XPU (GPU)" + ], + "metadata": { + "id": "xI9m0SxNfj7w" + } + }, + { + "cell_type": "code", + "source": [ + "device_map = \"xpu\"\n", + "tokenizer = AutoTokenizer.from_pretrained(model_name)\n", + "input_sentence = \"what's the capital of England?\"\n", + "inputs = tokenizer(input_sentence, return_tensors=\"pt\")\n", + "inputs = {key: tensor.to(\"xpu\") for key, tensor in inputs.items()}" + ], + "metadata": { + "id": "b7lZvHBhfoSC" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Loading the model on Intel XPU" + ], + "metadata": { + "id": "lWLpFMvzfq2m" + } + }, + { + "cell_type": "code", + "source": [ + "model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True, device_map=\"xpu\", trust_remote_code=True, use_llm_runtime=False)\n", + "model = ipex.optimize_transformers(model, inplace=True, dtype=torch.float16, quantization_config=True, device=device_map)" + ], + "metadata": { + "id": "Tl9-HUZRfuHE" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Generating embeddings" + ], + "metadata": { + "id": "ZAnYVZqBfxlB" + } + }, + { + "cell_type": "code", + "source": [ + "with torch.no_grad():\n", + " outputs = model(**inputs)\n", + " logits = outputs.logits\n", + "\n", + "embeddings = logits.mean(dim=1)\n", + "print(embeddings)" + ], + "metadata": { + "id": "a1DzuiqVfzF3" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "**example output**\n", + "\n", + "```\n", + "tensor([[ 4.3945e+00, -2.6588e-03, 9.7559e-01, ..., 5.6680e+00,\n", + " 1.0303e+00, 2.5488e+00]], device='xpu:0', dtype=torch.float16)\n", + "```" + ], + "metadata": { + "id": "TpYelWKaf5LZ" + } + } + ] +} \ No newline at end of file