tgi-gaudi server error with long inputs sent to chat_completion api using openai python sdk #248

minmin-intel · 2024-11-22T21:20:11Z

System Info

tgi-gaudi 2.0.5 docker image

HL-SMI Version: hl-1.17.0-fw-51.3.0
Driver Version: 1.17.0-28a11ca

model: llama3.1-70B-instruct

I was sending requests to tgi's chat completion API, using openai python sdk.

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Use the code here to reproduce: https://github.com/minmin-intel/GenAIEval/tree/test-helmet/evals/evaluation/HELMET
The command that I ran when encoutering this error: python3 eval.py --config configs/rag_short_test_tgi205.yaml --endpoint_url http://${host_ip}:${port}/v1/
This error occurred when input tokens = 8192 tokens.

I used this script to launch tgi-gaudi: https://github.com/minmin-intel/GenAIEval/blob/test-helmet/evals/evaluation/HELMET/tgi_gaudi/launch_tgi_gaudi.sh
max_input_length=65536
max_total_length=131072

I used 8 Gaudi2 cards.

The logs from tgi-gaudi:
2024-11-22T19:55:16.949329Z INFO text_generation_launcher: Default max_batch_prefill_tokens to 65586
2024-11-22T19:55:16.949336Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-11-22T19:55:16.949338Z INFO text_generation_launcher: Sharding model on 8 processes
2024-11-22T19:55:16.949424Z INFO download: text_generation_launcher: Starting download process.
2024-11-22T19:55:19.355996Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-11-22T19:55:19.752061Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-11-22T19:55:19.752297Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-11-22T19:55:25.552304Z INFO text_generation_launcher: CLI SHARDED = True DTYPE = bfloat16
2024-11-22T19:55:25.552383Z INFO text_generation_launcher: CLI SHARDED = 8
2024-11-22T19:55:25.552454Z INFO text_generation_launcher: CLI server start deepspeed =deepspeed --num_nodes 1 --num_gpus 8 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id meta-llama/Meta-Llama-3.1-70B-Instruct --revision None --sharded True --dtype bfloat16 --trust_remote_code False --uds_path /tmp/text-generation-server
.......
2024-11-22T19:57:08.388547Z INFO text_generation_router: router/src/main.rs:317: Using config Some(Llama)
2024-11-22T19:57:08.395301Z INFO text_generation_router: router/src/main.rs:345: Warming up model
2024-11-22T19:57:08.395328Z WARN text_generation_router: router/src/main.rs:361: Model does not support automatic max batch total tokens
2024-11-22T19:57:08.395330Z INFO text_generation_router: router/src/main.rs:383: Setting max batch total tokens to 131072
2024-11-22T19:57:08.395332Z INFO text_generation_router: router/src/main.rs:384: Connected
2024-11-22T19:57:08.395334Z WARN text_generation_router: router/src/main.rs:398: Invalid hostname, defaulting to 0.0.0.0
.....
2024-11-22T21:06:28.887195Z INFO chat_completions{total_time="4.836573003s" validation_time="10.504386ms" queue_time="29.864µs" inference_time="4.826038834s" time_per_token="965.207766ms" seed="Some(14430591328024606772)"}: text_generation_router::server: router/src/server.rs:324: Success
2024-11-22T21:06:33.803433Z ERROR batch{batch_size=1}:decode:decode{size=1}:decode{size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED

Expected behavior

Should give no error when passing in long inputs

The text was updated successfully, but these errors were encountered:

yuanwu2017 · 2024-12-03T01:45:00Z

It is parameters config issue. What is the output length?

minmin-intel · 2024-12-03T17:42:29Z

max_input_length=65536
max_total_length=131072

@yuanwu2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tgi-gaudi server error with long inputs sent to chat_completion api using openai python sdk #248

tgi-gaudi server error with long inputs sent to chat_completion api using openai python sdk #248

minmin-intel commented Nov 22, 2024

yuanwu2017 commented Dec 3, 2024

minmin-intel commented Dec 3, 2024

tgi-gaudi server error with long inputs sent to chat_completion api using openai python sdk #248

tgi-gaudi server error with long inputs sent to chat_completion api using openai python sdk #248

Comments

minmin-intel commented Nov 22, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

yuanwu2017 commented Dec 3, 2024

minmin-intel commented Dec 3, 2024