Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InferenceClient] Provide a way to deal with content-type header when sending raw bytes #2706

Open
Wauplin opened this issue Dec 11, 2024 · 2 comments

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Dec 11, 2024

First reported by @freddyaboulton on slack (private).

For some tasks (e.g. automatic-speech-recognition) we are sending raw bytes in the HTTP request. On InferenceAPI, there is a "content-type-guess" logic to implicitly interpret the data that is sent. However, this logic can be flawed and moreover, the Inference Endpoints don't have this logic.

It currently possible to provide the Content-Type headers by providing a value when initializing the InferenceClient :

client = InferenceClient(url, headers={"Content-Type": "audio/mpeg"})
response = client.automatic_speech_recognition("audio.mp3")

However this is not well documented + it doesn't feel correct to initialize a client with a specific content type (it would mean that all requests made with the client have to send the same content type).


Opening the issue here but not clear to me what we'd like to do. A few solutions could be:

  • (low hanging fruit) keep what we have but document how to set content-type headers in each method docstring
  • infer the content type header based on file extension (but don't work if raw bytes are passed to the method)
  • add a parameter? (but how would it work with auto-generation?)

Open to suggestions on this

@kiansierra
Copy link

Hi there.
I received the following warning, when calling directly the post endpoint for ASR

/home/kian/coding/rag-app/rag-app-backend/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'post' (from 'huggingface_hub.inference._client') is deprecated and will be removed from version '0.31.0'. Making direct POST requests to the inference server is not supported anymore. Please use task methods instead (e.g. InferenceClient.chat_completion). If your use case is not supported, please open an issue in https://github.com/huggingface/huggingface_hub.

Currently my call to the endpoint follows the approach of

client = InferenceClient()
model = "openai/whisper-large-v3-turbo"
response = client.post(
    json={
        "inputs": base64.b64encode(open(file_path, "rb").read()).decode(),
        "parameters": {"return_timestamps": True},
    },
    model=model,
    task="automatic-speech-recognition",
)
response_data = json.loads(response)

This is because the endpoint currently doesn't support parameters https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_client.py#L461-L502
I believe a simple change to the endpoint to accept parameters as a dict, should solve this, this could be also implemented in the above, to ensure the headers passed when calling the endpoint override the ones from the instance

@kiansierra
Copy link

I created a fast PR here kiansierra#1
and an example on how that would work https://github.com/kiansierra/huggingface_hub/blob/dae9625fed13668020504c3fbace2d9b4f18791c/test.ipynb.
This would also tie in with #2800 @Wauplin @hanouticelina

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants