You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 313, in run_engine
engine.loop()
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 265, in loop
self.model_runner.run(self.batch, self.query_manager)
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/balance_serve/inference/model_runner.py", line 194, in run
self.features = self.model.batch_embeddings(self.input[cuda_graph_idx], device=self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/models/custom_modeling_deepseek_v3.py", line 66, in batch_embeddings
self.model.embed_tokens(tokens.to(torch.device('cpu')))
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 192, in forward
return F.embedding(
^^^^^^^^^^^^
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/functional.py", line 2546, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index out of range in self
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
我只有家用平台,256GB内存,为此我选择了性能和内存较平衡的量化版本,即IQ2_M版本。在KT加载Deepseek-R1-671B-0528-IQ2_M版本的过程中,我发现内存使用量并没有增加,通过debug,发现MOE层的参数全都是元数据,只有CUDA层的参数被真正加载,显存也有占用,CUDA Graph构建过程正常,最后卡在输出第一个字之前,过一会还会报错:
Traceback (most recent call last):
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 313, in run_engine
engine.loop()
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 265, in loop
self.model_runner.run(self.batch, self.query_manager)
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/balance_serve/inference/model_runner.py", line 194, in run
self.features = self.model.batch_embeddings(self.input[cuda_graph_idx], device=self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/models/custom_modeling_deepseek_v3.py", line 66, in batch_embeddings
self.model.embed_tokens(tokens.to(torch.device('cpu')))
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 192, in forward
return F.embedding(
^^^^^^^^^^^^
File "/home/wangqs/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/functional.py", line 2546, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index out of range in self
Beta Was this translation helpful? Give feedback.
All reactions