Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's find in some machine. using hf_hub::api::sync::Api to download c… #3001

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sywangyi
Copy link
Contributor

@sywangyi sywangyi commented Feb 8, 2025

…onfig is not successful which will make warmup fail since attribute like max_position_embeddings could not be got. update hf-hub to the latest version could fix it

@OlivierDehaene OR @Narsil

@sywangyi
Copy link
Contributor Author

sywangyi commented Feb 8, 2025

opt 6.7 failure in warmup, error like
text-generation-launcher --model-id=facebook/opt-6.7b

2025-02-08T14:05:41.511494Z INFO text_generation_launcher: Using attention flashdecoding-ipex - Prefix caching 1
2025-02-08T14:05:41.512003Z INFO text_generation_launcher: Default max_batch_prefill_tokens to 4096
2025-02-08T14:05:41.512104Z INFO download: text_generation_launcher: Starting check and download process for facebook/opt-6.7b
2025-02-08T14:05:47.680062Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2025-02-08T14:05:48.323522Z INFO download: text_generation_launcher: Successfully downloaded weights for facebook/opt-6.7b
2025-02-08T14:05:48.323793Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2025-02-08T14:05:51.277017Z INFO text_generation_launcher: Using prefix caching = True
2025-02-08T14:05:51.277047Z INFO text_generation_launcher: Using Attention = flashdecoding-ipex
2025-02-08T14:05:51.500047Z WARN text_generation_launcher: Could not import Mamba: No module named 'mamba_ssm'
2025-02-08T14:05:56.510148Z INFO text_generation_launcher: Using prefill chunking = False
2025-02-08T14:05:56.705854Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2025-02-08T14:05:56.736275Z INFO shard-manager: text_generation_launcher: Shard ready in 8.406812945s rank=0
2025-02-08T14:05:56.829613Z INFO text_generation_launcher: Starting Webserver
2025-02-08T14:05:56.858517Z INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2025-02-08T14:05:56.891519Z ERROR text_generation_launcher: Method Warmup encountered an error.
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 10, in
sys.exit(app())
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 323, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 743, in main
return _main(
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 198, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
return callback(**use_params)
File "/usr/src/server/text_generation_server/cli.py", line 119, in serve
server.serve(
File "/usr/src/server/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/opt/conda/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.11/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(

File "/usr/src/server/text_generation_server/interceptor.py", line 24, in intercept
return await response
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/usr/src/server/text_generation_server/server.py", line 144, in Warmup
self.model.warmup(batch, max_input_tokens, max_total_tokens)
File "/usr/src/server/text_generation_server/models/model.py", line 135, in warmup
self.generate_token(batch)
File "/opt/conda/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
File "/usr/src/server/text_generation_server/models/causal_lm.py", line 693, in generate_token
logits, speculative_logits, past = self.forward(
File "/usr/src/server/text_generation_server/models/causal_lm.py", line 678, in forward
outputs = self.model.forward(**kwargs)
File "/usr/src/server/text_generation_server/models/custom_modeling/opt_modeling.py", line 802, in forward
outputs = self.model.decoder(
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/src/server/text_generation_server/models/custom_modeling/opt_modeling.py", line 627, in forward
pos_embeds = self.embed_positions(attention_mask, past_key_values_length)
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/src/server/text_generation_server/models/custom_modeling/opt_modeling.py", line 121, in forward
return torch.nn.functional.embedding(positions + self.offset, self.weight)
File "/opt/conda/lib/python3.11/site-packages/torch/nn/functional.py", line 2551, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: INDICES element is out of DATA bounds, id=2050 axis_dim=2050
2025-02-08T14:05:56.891684Z ERROR warmup{max_input_length=None max_prefill_tokens=4096 max_total_tokens=None max_batch_size=None}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: INDICES element is out of DATA bounds, id=2050 axis_dim=2050
Error: Backend(Warmup(Generation("INDICES element is out of DATA bounds, id=2050 axis_dim=2050")))

@sywangyi
Copy link
Contributor Author

sywangyi commented Feb 8, 2025

because the config.json could not get in laucher, max_batch_prefill_tokens is set to default 4098. could be set to 2048 according to the https://huggingface.co/facebook/opt-6.7b/blob/main/config.json#L17

…onfig is not successful which will make warmup fail since attribute like max_position_embeddings could not be got. update hf-hub to the latest version could fix it

Signed-off-by: Wang, Yi A <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant