❯ llm mlx download-model mlx-community/Qwen3-8B-4bit
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 144631.17it/s]
Traceback (most recent call last):
File "/home/hamish/.local/bin/llm", line 10, in <module>
sys.exit(cli())
^^^^^
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/click/core.py", line 1442, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/click/core.py", line 1830, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/click/core.py", line 1830, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/llm_mlx.py", line 56, in download_model
MlxModel(model_path).prompt("hi").text()
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/llm/models.py", line 520, in text
self._force()
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/llm/models.py", line 517, in _force
list(self)
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/llm/models.py", line 554, in __iter__
for chunk in self.model.execute(
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/llm_mlx.py", line 260, in execute
for chunk in stream_generate(
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/mlx_lm/generate.py", line 633, in stream_generate
with wired_limit(model, [generation_stream]):
File "/home/linuxbrew/.linuxbrew/opt/python@3.11/lib/python3.11/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/home/hamish/.local/share/uv/tools/llm/lib/python3.11/site-packages/mlx_lm/generate.py", line 222, in wired_limit
max_rec_size = mx.metal.device_info()["max_recommended_working_set_size"]
^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [metal::device_info] Cannot get device info without metal backend
My reading suggests metal is a MacOS thing, so not sure why it is trying to use it on Ubuntu.
❯ llm plugins
[
{
"name": "llm-mlx",
"hooks": [
"register_commands",
"register_models"
],
"version": "0.4"
},
{
"name": "llm-gpt4all",
"hooks": [
"register_models"
],
"version": "0.4"
}
]
running on python 3.11 to get past the sentencepiece bug.
/home/hamish/.local/share/uv/tools/llm/bin/python
Python 3.11.12 (main, Apr 8 2025, 14:15:29) [GCC 11.4.0] on linux
The terminal output is:
My reading suggests metal is a MacOS thing, so not sure why it is trying to use it on Ubuntu.
I am running
Other info:
running on python 3.11 to get past the sentencepiece bug.