You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The logs from tgi-gaudi:
2024-11-22T19:55:16.949329Z INFO text_generation_launcher: Default max_batch_prefill_tokens to 65586
2024-11-22T19:55:16.949336Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-11-22T19:55:16.949338Z INFO text_generation_launcher: Sharding model on 8 processes
2024-11-22T19:55:16.949424Z INFO download: text_generation_launcher: Starting download process.
2024-11-22T19:55:19.355996Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-11-22T19:55:19.752061Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-11-22T19:55:19.752297Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-11-22T19:55:25.552304Z INFO text_generation_launcher: CLI SHARDED = True DTYPE = bfloat16
2024-11-22T19:55:25.552383Z INFO text_generation_launcher: CLI SHARDED = 8
2024-11-22T19:55:25.552454Z INFO text_generation_launcher: CLI server start deepspeed =deepspeed --num_nodes 1 --num_gpus 8 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id meta-llama/Meta-Llama-3.1-70B-Instruct --revision None --sharded True --dtype bfloat16 --trust_remote_code False --uds_path /tmp/text-generation-server
.......
2024-11-22T19:57:08.388547Z INFO text_generation_router: router/src/main.rs:317: Using config Some(Llama)
2024-11-22T19:57:08.395301Z INFO text_generation_router: router/src/main.rs:345: Warming up model
2024-11-22T19:57:08.395328Z WARN text_generation_router: router/src/main.rs:361: Model does not support automatic max batch total tokens
2024-11-22T19:57:08.395330Z INFO text_generation_router: router/src/main.rs:383: Setting max batch total tokens to 131072
2024-11-22T19:57:08.395332Z INFO text_generation_router: router/src/main.rs:384: Connected
2024-11-22T19:57:08.395334Z WARN text_generation_router: router/src/main.rs:398: Invalid hostname, defaulting to 0.0.0.0
.....
2024-11-22T21:06:28.887195Z INFO chat_completions{total_time="4.836573003s" validation_time="10.504386ms" queue_time="29.864µs" inference_time="4.826038834s" time_per_token="965.207766ms" seed="Some(14430591328024606772)"}: text_generation_router::server: router/src/server.rs:324: Success
2024-11-22T21:06:33.803433Z ERROR batch{batch_size=1}:decode:decode{size=1}:decode{size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED
Expected behavior
Should give no error when passing in long inputs
The text was updated successfully, but these errors were encountered:
System Info
tgi-gaudi 2.0.5 docker image
HL-SMI Version: hl-1.17.0-fw-51.3.0
Driver Version: 1.17.0-28a11ca
model: llama3.1-70B-instruct
I was sending requests to tgi's chat completion API, using openai python sdk.
Information
Tasks
Reproduction
Use the code here to reproduce: https://github.com/minmin-intel/GenAIEval/tree/test-helmet/evals/evaluation/HELMET
The command that I ran when encoutering this error: python3 eval.py --config configs/rag_short_test_tgi205.yaml --endpoint_url http://${host_ip}:${port}/v1/
This error occurred when input tokens = 8192 tokens.
I used this script to launch tgi-gaudi: https://github.com/minmin-intel/GenAIEval/blob/test-helmet/evals/evaluation/HELMET/tgi_gaudi/launch_tgi_gaudi.sh
max_input_length=65536
max_total_length=131072
I used 8 Gaudi2 cards.
The logs from tgi-gaudi:
2024-11-22T19:55:16.949329Z INFO text_generation_launcher: Default
max_batch_prefill_tokens
to 655862024-11-22T19:55:16.949336Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-11-22T19:55:16.949338Z INFO text_generation_launcher: Sharding model on 8 processes
2024-11-22T19:55:16.949424Z INFO download: text_generation_launcher: Starting download process.
2024-11-22T19:55:19.355996Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-11-22T19:55:19.752061Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-11-22T19:55:19.752297Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-11-22T19:55:25.552304Z INFO text_generation_launcher: CLI SHARDED = True DTYPE = bfloat16
2024-11-22T19:55:25.552383Z INFO text_generation_launcher: CLI SHARDED = 8
2024-11-22T19:55:25.552454Z INFO text_generation_launcher: CLI server start deepspeed =deepspeed --num_nodes 1 --num_gpus 8 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id meta-llama/Meta-Llama-3.1-70B-Instruct --revision None --sharded True --dtype bfloat16 --trust_remote_code False --uds_path /tmp/text-generation-server
.......
2024-11-22T19:57:08.388547Z INFO text_generation_router: router/src/main.rs:317: Using config Some(Llama)
2024-11-22T19:57:08.395301Z INFO text_generation_router: router/src/main.rs:345: Warming up model
2024-11-22T19:57:08.395328Z WARN text_generation_router: router/src/main.rs:361: Model does not support automatic max batch total tokens
2024-11-22T19:57:08.395330Z INFO text_generation_router: router/src/main.rs:383: Setting max batch total tokens to 131072
2024-11-22T19:57:08.395332Z INFO text_generation_router: router/src/main.rs:384: Connected
2024-11-22T19:57:08.395334Z WARN text_generation_router: router/src/main.rs:398: Invalid hostname, defaulting to 0.0.0.0
.....
2024-11-22T21:06:28.887195Z INFO chat_completions{total_time="4.836573003s" validation_time="10.504386ms" queue_time="29.864µs" inference_time="4.826038834s" time_per_token="965.207766ms" seed="Some(14430591328024606772)"}: text_generation_router::server: router/src/server.rs:324: Success
2024-11-22T21:06:33.803433Z ERROR batch{batch_size=1}:decode:decode{size=1}:decode{size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED
Expected behavior
Should give no error when passing in long inputs
The text was updated successfully, but these errors were encountered: