Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tgi-gaudi server error with long inputs sent to chat_completion api using openai python sdk #248

Open
2 of 4 tasks
minmin-intel opened this issue Nov 22, 2024 · 2 comments

Comments

@minmin-intel
Copy link

System Info

tgi-gaudi 2.0.5 docker image

HL-SMI Version: hl-1.17.0-fw-51.3.0
Driver Version: 1.17.0-28a11ca

model: llama3.1-70B-instruct

I was sending requests to tgi's chat completion API, using openai python sdk.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Use the code here to reproduce: https://github.com/minmin-intel/GenAIEval/tree/test-helmet/evals/evaluation/HELMET
The command that I ran when encoutering this error: python3 eval.py --config configs/rag_short_test_tgi205.yaml --endpoint_url http://${host_ip}:${port}/v1/
This error occurred when input tokens = 8192 tokens.

I used this script to launch tgi-gaudi: https://github.com/minmin-intel/GenAIEval/blob/test-helmet/evals/evaluation/HELMET/tgi_gaudi/launch_tgi_gaudi.sh
max_input_length=65536
max_total_length=131072

I used 8 Gaudi2 cards.

The logs from tgi-gaudi:
2024-11-22T19:55:16.949329Z INFO text_generation_launcher: Default max_batch_prefill_tokens to 65586
2024-11-22T19:55:16.949336Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-11-22T19:55:16.949338Z INFO text_generation_launcher: Sharding model on 8 processes
2024-11-22T19:55:16.949424Z INFO download: text_generation_launcher: Starting download process.
2024-11-22T19:55:19.355996Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-11-22T19:55:19.752061Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-11-22T19:55:19.752297Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-11-22T19:55:25.552304Z INFO text_generation_launcher: CLI SHARDED = True DTYPE = bfloat16
2024-11-22T19:55:25.552383Z INFO text_generation_launcher: CLI SHARDED = 8
2024-11-22T19:55:25.552454Z INFO text_generation_launcher: CLI server start deepspeed =deepspeed --num_nodes 1 --num_gpus 8 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id meta-llama/Meta-Llama-3.1-70B-Instruct --revision None --sharded True --dtype bfloat16 --trust_remote_code False --uds_path /tmp/text-generation-server
.......
2024-11-22T19:57:08.388547Z INFO text_generation_router: router/src/main.rs:317: Using config Some(Llama)
2024-11-22T19:57:08.395301Z INFO text_generation_router: router/src/main.rs:345: Warming up model
2024-11-22T19:57:08.395328Z WARN text_generation_router: router/src/main.rs:361: Model does not support automatic max batch total tokens
2024-11-22T19:57:08.395330Z INFO text_generation_router: router/src/main.rs:383: Setting max batch total tokens to 131072
2024-11-22T19:57:08.395332Z INFO text_generation_router: router/src/main.rs:384: Connected
2024-11-22T19:57:08.395334Z WARN text_generation_router: router/src/main.rs:398: Invalid hostname, defaulting to 0.0.0.0
.....
2024-11-22T21:06:28.887195Z INFO chat_completions{total_time="4.836573003s" validation_time="10.504386ms" queue_time="29.864µs" inference_time="4.826038834s" time_per_token="965.207766ms" seed="Some(14430591328024606772)"}: text_generation_router::server: router/src/server.rs:324: Success
2024-11-22T21:06:33.803433Z ERROR batch{batch_size=1}:decode:decode{size=1}:decode{size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED

Expected behavior

Should give no error when passing in long inputs

@yuanwu2017
Copy link
Collaborator

It is parameters config issue. What is the output length?

@minmin-intel
Copy link
Author

max_input_length=65536
max_total_length=131072

@yuanwu2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants