Replies: 2 comments
-
|
I agree this is mainly a discoverability problem. There is already some server documentation in the repo, but it is buried in mlx_lm/SERVER.md instead of being surfaced clearly from the main README or a proper docs site. Because of that, it is much harder to find through search, AI tools, or normal browsing. That file already covers the purpose of mlx_lm.server, how to start it, the OpenAI-style /v1/chat/completions endpoint, request/response fields, the /v1/models endpoint, and it also notes that the server is not recommended for production because it only has basic security checks. So the issue is not really that documentation does not exist. It is that the documentation is hard to discover, fragmented, and not presented like product documentation. For a project this visible, that creates unnecessary friction for users trying to evaluate or adopt mlx_lm.server. A practical improvement path could be: Link mlx_lm/SERVER.md prominently from the main README. Even a lightweight first step, like linking SERVER.md clearly from the root README, would already help a lot. Right now it is easy for users to assume there is no real server documentation, even though some useful material is already there. |
Beta Was this translation helpful? Give feedback.
-
|
I have tried to start documenting mlx_lm.server a bit better and explaining some of the core concepts for the benchmarks at https://github.com/guruswami-ai/mlx-benchmarks/blob/main/docs/. It is a work in progress but a lot of the documentation is relevant, especially for distributed scenarios. The learning curve is steep and this was the documentation I wish I had when I started out. If you want to explore what is possible with distributed mlx_lm.server I've created an interactive simulation so you can see what is possible with up to five nodes of M3 Ultra running in full RDMA TB5 mesh at https://chakra.guruswami.ai |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Theres almost no official documentation I can find for mlx_lm.server. Per this discussion, this project is intended to rival other solutions like llama.cpp. However the lack of documentation and the cascading effect to poor results from AI search engines and no presence in tech forums like reddit, means despite using solely Apple hardware I am just relying on llama.cpp.
Request to please create a robust documentation site.
Beta Was this translation helpful? Give feedback.
All reactions