You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Propagate `**kwargs` to `sentence-transformers` and `diffusers` pipelines
* Add `HF_TRUST_REMOTE_CODE` env var
* Fix `HF_TRUST_REMOTE_CODE` bool-handling via `strtobool`
The `strtobool` had to be defined within `huggingface_inference_toolkit`
since it's deprecated and removed from `distutils` from Python 3.10
onwards.
* Fix some typos with `codespell`
* Update `README.md`
* Bump version to `0.4.2`
* Move `strtobool` to `env_utils` module to avoid circular import
* Revert enforce of `trust_remote_code=True`
* Remove `logging` messages for debug
* Fix `diffusers` propagation of `trust_remote_code=True`
<h1style="margin-top:auto;"> Hugging Face Inference Toolkit <h1>
5
5
</div>
6
6
7
-
8
7
Hugging Face Inference Toolkit is for serving 🤗 Transformers models in containers. This library provides default pre-processing, predict and postprocessing for Transformers, Sentence Tranfsformers. It is also possible to define custom `handler.py` for customization. The Toolkit is build to work with the [Hugging Face Hub](https://huggingface.co/models).
9
8
10
9
---
11
10
12
11
## 💻 Getting Started with Hugging Face Inference Toolkit
13
12
14
-
* Clone the repository `git clone https://github.com/huggingface/huggingface-inference-toolkit``
15
-
* Install the dependencies in dev mode `pip install -e ".[torch, st, diffusers, test,quality]"`
16
-
* If you develop on AWS inferentia2 install with `pip install -e ".[test,quality]" optimum-neuron[neuronx] --upgrade`
13
+
* Clone the repository `git clone <https://github.com/huggingface/huggingface-inference-toolkit``>
14
+
* Install the dependencies in dev mode `pip install -e ".[torch,st,diffusers,test,quality]"`
15
+
* If you develop on AWS inferentia2 install with `pip install -e ".[test,quality]" optimum-neuron[neuronx] --upgrade`
16
+
* If you develop on Google Cloud install with `pip install -e ".[torch,st,diffusers,google,test,quality]"`
1. build the preferred container for either CPU or GPU for PyTorch.
32
30
33
-
_cpu images_
31
+
_CPU Images_
32
+
34
33
```bash
35
34
make inference-pytorch-cpu
36
35
```
37
36
38
-
_gpu images_
37
+
_GPU Images_
38
+
39
39
```bash
40
40
make inference-pytorch-gpu
41
41
```
42
42
43
43
2. Run the container and provide either environment variables to the HUB model you want to use or mount a volume to the container, where your model is stored.
44
44
45
-
46
45
```bash
47
46
docker run -ti -p 5000:5000 -e HF_MODEL_ID=distilbert-base-uncased-distilled-squad -e HF_TASK=question-answering integration-test-pytorch:cpu
48
47
docker run -ti -p 5000:5000 --gpus all -e HF_MODEL_ID=nlpconnect/vit-gpt2-image-captioning -e HF_TASK=image-to-text integration-test-pytorch:gpu
@@ -51,43 +50,44 @@ docker run -ti -p 5000:5000 --gpus all -e HF_MODEL_ID=stabilityai/stable-diffusi
51
50
docker run -ti -p 5000:5000 -e HF_MODEL_DIR=/repository -v $(pwd)/distilbert-base-uncased-emotion:/repository integration-test-pytorch:cpu
52
51
```
53
52
54
-
55
53
3. Send request. The API schema is the same as from the [inference API](https://huggingface.co/docs/api-inference/detailed_parameters)
56
54
57
55
```bash
58
56
curl --request POST \
59
57
--url http://localhost:5000 \
60
58
--header 'Content-Type: application/json' \
61
59
--data '{
62
-
"inputs": {
63
-
"question": "What is used for inference?",
64
-
"context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
65
-
}
60
+
"inputs": {
61
+
"question": "What is used for inference?",
62
+
"context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
63
+
}
66
64
}'
67
65
```
68
66
69
67
### Custom Handler and dependency support
70
68
71
-
The Hugging Face Inference Toolkit allows user to provide a custom inference through a `handler.py` file which is located in the repository.
72
-
For an example check [https://huggingface.co/philschmid/custom-pipeline-text-classification](https://huggingface.co/philschmid/custom-pipeline-text-classification):
69
+
The Hugging Face Inference Toolkit allows user to provide a custom inference through a `handler.py` file which is located in the repository.
70
+
71
+
For an example check [philschmid/custom-pipeline-text-classification](https://huggingface.co/philschmid/custom-pipeline-text-classification):
72
+
73
73
```bash
74
74
model.tar.gz/
75
75
|- pytorch_model.bin
76
76
|- ....
77
77
|- handler.py
78
78
|- requirements.txt
79
79
```
80
+
80
81
In this example, `pytroch_model.bin` is the model file saved from training, `handler.py` is the custom inference handler, and `requirements.txt` is a requirements file to add additional dependencies.
81
82
The custom module can override the following methods:
82
83
83
-
84
84
### Vertex AI Support
85
85
86
-
The Hugging Face Inference Toolkit is also supported on Vertex AI, based on [Custom container requirements for prediction](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements). [Environment variables set by Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables) are automatically detected and used by the toolkit.
86
+
The Hugging Face Inference Toolkit is also supported on Vertex AI, based on [Custom container requirements for prediction](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements). [Environment variables set by Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables) are automatically detected and used by the toolkit.
87
87
88
88
#### Local run with HF_MODEL_ID and HF_TASK
89
89
90
-
Start Hugging Face Inference Toolkit with the following environment variables.
90
+
Start Hugging Face Inference Toolkit with the following environment variables.
91
91
92
92
```bash
93
93
mkdir tmp2/
@@ -101,8 +101,8 @@ curl --request POST \
101
101
--url http://localhost:8080/pred \
102
102
--header 'Content-Type: application/json' \
103
103
--data '{
104
-
"instances": ["I love this product", "I hate this product"],
105
-
"parameters": { "top_k": 2 }
104
+
"instances": ["I love this product", "I hate this product"],
"instances": ["I love this product", "I hate this product"],
131
-
"parameters": { "top_k": 2 }
127
+
--url http://localhost:8080/pred \
128
+
--header 'Content-Type: application/json' \
129
+
--data '{
130
+
"instances": ["I love this product", "I hate this product"],
131
+
"parameters": { "top_k": 2 }
132
132
}'
133
133
```
134
134
135
-
### AWS Inferentia2 Support
135
+
### AWS Inferentia2 Support
136
136
137
137
The Hugging Face Inference Toolkit provides support for deploying Hugging Face on AWS Inferentia2. To deploy a model on Inferentia2 you have 3 options:
138
-
* Provide `HF_MODEL_ID`, the model repo id on huggingface.co which contains the compiled model under `.neuron` format. e.g. `optimum/bge-base-en-v1.5-neuronx`
138
+
139
+
* Provide `HF_MODEL_ID`, the model repo id on huggingface.co which contains the compiled model under `.neuron` format e.g. `optimum/bge-base-en-v1.5-neuronx`
139
140
* Provide the `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH` environment variables to compile the model on the fly, e.g. `HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128`
140
141
* Include `neuron` dictionary in the [config.json](https://huggingface.co/optimum/tiny_random_bert_neuron/blob/main/config.json) file in the model archive, e.g. `neuron: {"static_batch_size": 1, "static_sequence_length": 128}`
141
142
142
143
The currently supported tasks can be found [here](https://huggingface.co/docs/optimum-neuron/en/package_reference/supported_models). If you plan to deploy an LLM, we recommend taking a look at [Neuronx TGI](https://huggingface.co/blog/text-generation-inference-on-inferentia2), which is purposly build for LLMs.
143
144
144
145
#### Local run with HF_MODEL_ID and HF_TASK
145
146
146
-
Start Hugging Face Inference Toolkit with the following environment variables.
147
+
Start Hugging Face Inference Toolkit with the following environment variables.
147
148
148
149
_Note: You need to run this on an Inferentia2 instance._
149
150
150
-
- transformers `text-classification` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
151
+
* transformers `text-classification` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
"inputs": "Wow, this is such a great product. I love it!",
194
-
"parameters": { "top_k": 2 }
193
+
--url http://localhost:5000 \
194
+
--header 'Content-Type: application/json' \
195
+
--data '{
196
+
"inputs": "Wow, this is such a great product. I love it!",
197
+
"parameters": { "top_k": 2 }
195
198
}'
196
199
```
197
200
198
-
199
201
---
200
202
201
203
## 🛠️ Environment variables
202
204
203
-
The Hugging Face Inference Toolkit implements various additional environment variables to simplify your deployment experience. A full list of environment variables is given below. All potential environment varialbes can be found in [const.py](src/huggingface_inference_toolkit/const.py)
205
+
The Hugging Face Inference Toolkit implements various additional environment variables to simplify your deployment experience. A full list of environment variables is given below. All potential environment variables can be found in [const.py](src/huggingface_inference_toolkit/const.py)
204
206
205
207
### `HF_MODEL_DIR`
206
208
207
-
The `HF_MODEL_DIR` environment variable defines the directory where your model is stored or will be stored.
208
-
If `HF_MODEL_ID` is not set the toolkit expects a the model artifact at this directory. This value should be set to the value where you mount your model artifacts.
209
-
If `HF_MODEL_ID` is set the toolkit and the directory where `HF_MODEL_DIR` is pointing to is empty. The toolkit will download the model from the Hub to this directory.
209
+
The `HF_MODEL_DIR` environment variable defines the directory where your model is stored or will be stored.
210
+
If `HF_MODEL_ID` is not set the toolkit expects a the model artifact at this directory. This value should be set to the value where you mount your model artifacts.
211
+
If `HF_MODEL_ID` is set the toolkit and the directory where `HF_MODEL_DIR` is pointing to is empty. The toolkit will download the model from the Hub to this directory.
210
212
211
213
The default value is `/opt/huggingface/model`
212
214
@@ -246,6 +248,14 @@ The `HF_HUB_TOKEN` environment variable defines the your Hugging Face authorizat
246
248
HF_HUB_TOKEN="api_XXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
247
249
```
248
250
251
+
### `HF_TRUST_REMOTE_CODE`
252
+
253
+
The `HF_TRUST_REMOTE_CODE` environment variable defines whether to trust remote code. This flag is already used for community defined inference code, and is therefore quite representative of the level of confidence you are giving the model providers when loading models from the Hugging Face Hub. The default value is `"0"`; set it to `"1"` to trust remote code.
254
+
255
+
```bash
256
+
HF_TRUST_REMOTE_CODE="0"
257
+
```
258
+
249
259
### `HF_FRAMEWORK`
250
260
251
261
The `HF_FRAMEWORK` environment variable defines the base deep learning framework used in the container. This is important when loading large models from the Hugguing Face Hub to avoid extra file downloads.
@@ -256,28 +266,28 @@ HF_FRAMEWORK="pytorch"
256
266
257
267
#### `HF_OPTIMUM_BATCH_SIZE`
258
268
259
-
The `HF_OPTIMUM_BATCH_SIZE` environment variable defines the batch size, which is used when compiling the model to Neuron. The default value is `1`. Not required when model is already converted.
269
+
The `HF_OPTIMUM_BATCH_SIZE` environment variable defines the batch size, which is used when compiling the model to Neuron. The default value is `1`. Not required when model is already converted.
260
270
261
271
```bash
262
272
HF_OPTIMUM_BATCH_SIZE="1"
263
273
```
264
274
265
275
#### `HF_OPTIMUM_SEQUENCE_LENGTH`
266
276
267
-
The `HF_OPTIMUM_SEQUENCE_LENGTH` environment variable defines the sequence length, which is used when compiling the model to Neuron. There is no default value. Not required when model is already converted.
277
+
The `HF_OPTIMUM_SEQUENCE_LENGTH` environment variable defines the sequence length, which is used when compiling the model to Neuron. There is no default value. Not required when model is already converted.
Copy file name to clipboardExpand all lines: setup.py
+2-2
Original file line number
Diff line number
Diff line change
@@ -5,12 +5,12 @@
5
5
# We don't declare our dependency on transformers here because we build with
6
6
# different packages for different variants
7
7
8
-
VERSION="0.4.1.dev0"
8
+
VERSION="0.4.2"
9
9
10
10
# Ubuntu packages
11
11
# libsndfile1-dev: torchaudio requires the development version of the libsndfile package which can be installed via a system package manager. On Ubuntu it can be installed as follows: apt install libsndfile1-dev
12
12
# ffmpeg: ffmpeg is required for audio processing. On Ubuntu it can be installed as follows: apt install ffmpeg
13
-
# libavcodec-extra : libavcodec-extra inculdes additional codecs for ffmpeg
13
+
# libavcodec-extra : libavcodec-extra includes additional codecs for ffmpeg
0 commit comments