It's find in some machine. using hf_hub::api::sync::Api to download c… #3001

sywangyi · 2025-02-08T06:11:40Z

…onfig is not successful which will make warmup fail since attribute like max_position_embeddings could not be got. update hf-hub to the latest version could fix it

@OlivierDehaene OR @Narsil

sywangyi · 2025-02-08T06:15:15Z

opt 6.7 failure in warmup, error like
text-generation-launcher --model-id=facebook/opt-6.7b

2025-02-08T14:05:41.511494Z INFO text_generation_launcher: Using attention flashdecoding-ipex - Prefix caching 1
2025-02-08T14:05:41.512003Z INFO text_generation_launcher: Default max_batch_prefill_tokens to 4096
2025-02-08T14:05:41.512104Z INFO download: text_generation_launcher: Starting check and download process for facebook/opt-6.7b
2025-02-08T14:05:47.680062Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2025-02-08T14:05:48.323522Z INFO download: text_generation_launcher: Successfully downloaded weights for facebook/opt-6.7b
2025-02-08T14:05:48.323793Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2025-02-08T14:05:51.277017Z INFO text_generation_launcher: Using prefix caching = True
2025-02-08T14:05:51.277047Z INFO text_generation_launcher: Using Attention = flashdecoding-ipex
2025-02-08T14:05:51.500047Z WARN text_generation_launcher: Could not import Mamba: No module named 'mamba_ssm'
2025-02-08T14:05:56.510148Z INFO text_generation_launcher: Using prefill chunking = False
2025-02-08T14:05:56.705854Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2025-02-08T14:05:56.736275Z INFO shard-manager: text_generation_launcher: Shard ready in 8.406812945s rank=0
2025-02-08T14:05:56.829613Z INFO text_generation_launcher: Starting Webserver
2025-02-08T14:05:56.858517Z INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2025-02-08T14:05:56.891519Z ERROR text_generation_launcher: Method Warmup encountered an error.
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 10, in
sys.exit(app())
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 323, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 743, in main
return _main(
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 198, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
return callback(**use_params)
File "/usr/src/server/text_generation_server/cli.py", line 119, in serve
server.serve(
File "/usr/src/server/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/opt/conda/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.11/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(

File "/usr/src/server/text_generation_server/interceptor.py", line 24, in intercept
return await response
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/usr/src/server/text_generation_server/server.py", line 144, in Warmup
self.model.warmup(batch, max_input_tokens, max_total_tokens)
File "/usr/src/server/text_generation_server/models/model.py", line 135, in warmup
self.generate_token(batch)
File "/opt/conda/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
File "/usr/src/server/text_generation_server/models/causal_lm.py", line 693, in generate_token
logits, speculative_logits, past = self.forward(
File "/usr/src/server/text_generation_server/models/causal_lm.py", line 678, in forward
outputs = self.model.forward(**kwargs)
File "/usr/src/server/text_generation_server/models/custom_modeling/opt_modeling.py", line 802, in forward
outputs = self.model.decoder(
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/src/server/text_generation_server/models/custom_modeling/opt_modeling.py", line 627, in forward
pos_embeds = self.embed_positions(attention_mask, past_key_values_length)
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/src/server/text_generation_server/models/custom_modeling/opt_modeling.py", line 121, in forward
return torch.nn.functional.embedding(positions + self.offset, self.weight)
File "/opt/conda/lib/python3.11/site-packages/torch/nn/functional.py", line 2551, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: INDICES element is out of DATA bounds, id=2050 axis_dim=2050
2025-02-08T14:05:56.891684Z ERROR warmup{max_input_length=None max_prefill_tokens=4096 max_total_tokens=None max_batch_size=None}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: INDICES element is out of DATA bounds, id=2050 axis_dim=2050
Error: Backend(Warmup(Generation("INDICES element is out of DATA bounds, id=2050 axis_dim=2050")))

sywangyi · 2025-02-08T06:17:59Z

because the config.json could not get in laucher, max_batch_prefill_tokens is set to default 4098. could be set to 2048 according to the https://huggingface.co/facebook/opt-6.7b/blob/main/config.json#L17

…onfig is not successful which will make warmup fail since attribute like max_position_embeddings could not be got. update hf-hub to the latest version could fix it Signed-off-by: Wang, Yi A <[email protected]>

It's find in some machine. using hf_hub::api::sync::Api to download c…

b7d86e8

…onfig is not successful which will make warmup fail since attribute like max_position_embeddings could not be got. update hf-hub to the latest version could fix it Signed-off-by: Wang, Yi A <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It's find in some machine. using hf_hub::api::sync::Api to download c… #3001

It's find in some machine. using hf_hub::api::sync::Api to download c… #3001

sywangyi commented Feb 8, 2025

sywangyi commented Feb 8, 2025

sywangyi commented Feb 8, 2025

It's find in some machine. using hf_hub::api::sync::Api to download c… #3001

Are you sure you want to change the base?

It's find in some machine. using hf_hub::api::sync::Api to download c… #3001

Conversation

sywangyi commented Feb 8, 2025

sywangyi commented Feb 8, 2025

sywangyi commented Feb 8, 2025