|
| 1 | +# LoRA (Low-Rank Adaptation) |
| 2 | + |
| 3 | +## What is LoRA? |
| 4 | + |
| 5 | +LoRA is a technique that allows for efficent fine-tuning a model while only updating a small portion of the model's weights. This is useful when you have a large model that has been pre-trained on a large dataset, but you want to fine-tune it on a smaller dataset or for a specific task. |
| 6 | + |
| 7 | +LoRA works by adding a small number of additional weights to the model, which are used to adapt the model to the new dataset or task. These additional weights are learned during the fine-tuning process, while the rest of the model's weights are kept fixed. |
| 8 | + |
| 9 | +## How is it used? |
| 10 | + |
| 11 | +LoRA can be used in many ways and the community is always finding new ways to use it. Here are some examples of how you can use LoRA: |
| 12 | + |
| 13 | +Technically, LoRA can be used to fine-tune a large language model on a small dataset. However, these use cases can span a wide range of applications, such as: |
| 14 | + |
| 15 | +- fine-tuning a language model on a small dataset |
| 16 | +- fine-tuning a language model on a domain-specific dataset |
| 17 | +- fine-tuning a language model on a dataset with limited labels |
| 18 | + |
| 19 | +## Optimizing Inference with LoRA |
| 20 | + |
| 21 | +LoRA's can be used during inference by mutliplying the adapter weights with the model weights at each specified layer. This process can be computationally expensive, but due to awesome work by [punica-ai](https://github.com/punica-ai/punica) and the [lorax](https://github.com/predibase/lorax) team, optimized kernels/and frameworks have been developed to make this process more efficient. TGI leverages these optimizations in order to provide fast and efficient inference with mulitple LoRA models. |
| 22 | + |
| 23 | +## Serving multiple LoRA adapters with TGI |
| 24 | + |
| 25 | +Once a LoRA model has been trained, it can be used to generate text or perform other tasks just like a regular language model. However, because the model has been fine-tuned on a specific dataset, it may perform better on that dataset than a model that has not been fine-tuned. |
| 26 | + |
| 27 | +In practice its often useful to have multiple LoRA models, each fine-tuned on a different dataset or for a different task. This allows you to use the model that is best suited for a particular task or dataset. |
| 28 | + |
| 29 | +Text Generation Inference (TGI) now supports loading multiple LoRA models at startup that can be used in generation requests. This feature is available starting from version `~2.0.6` and is compatible with LoRA models trained using the `peft` library. |
| 30 | + |
| 31 | +### Specifying LoRA models |
| 32 | + |
| 33 | +To use LoRA in TGI, when starting the server, you can specify the list of LoRA models to load using the `LORA_ADAPTERS` environment variable. For example: |
| 34 | + |
| 35 | +```bash |
| 36 | +LORA_ADAPTERS=predibase/customer_support,predibase/dbpedia |
| 37 | +``` |
| 38 | + |
| 39 | +In the server logs, you will see the following message: |
| 40 | + |
| 41 | +```txt |
| 42 | +Loading adapter weights into model: predibase/customer_support |
| 43 | +Loading adapter weights into model: predibase/dbpedia |
| 44 | +``` |
| 45 | + |
| 46 | +## Generate text |
| 47 | + |
| 48 | +You can then use these models in generation requests by specifying the `lora_model` parameter in the request payload. For example: |
| 49 | + |
| 50 | +```json |
| 51 | +curl 127.0.0.1:3000/generate \ |
| 52 | + -X POST \ |
| 53 | + -H 'Content-Type: application/json' \ |
| 54 | + -d '{ |
| 55 | + "inputs": "Hello who are you?", |
| 56 | + "parameters": { |
| 57 | + "max_new_tokens": 40, |
| 58 | + "adapter_id": "predibase/customer_support" |
| 59 | + } |
| 60 | +}' |
| 61 | +``` |
| 62 | + |
| 63 | +> **Note:** The Lora feature is new and still being improved. If you encounter any issues or have any feedback, please let us know by opening an issue on the [GitHub repository](https://github.com/huggingface/text-generation-inference/issues/new/choose). Additionally documentation and an improved client library will be published soon. |
| 64 | +
|
| 65 | +An updated tutorial with detailed examples will be published soon. Stay tuned! |
0 commit comments