mlx_lm.server documentation lacking #1005

rosmur · 2026-03-15T02:45:29Z

rosmur
Mar 15, 2026

Theres almost no official documentation I can find for mlx_lm.server. Per this discussion, this project is intended to rival other solutions like llama.cpp. However the lack of documentation and the cascading effect to poor results from AI search engines and no presence in tech forums like reddit, means despite using solely Apple hardware I am just relying on llama.cpp.

Request to please create a robust documentation site.

shayanjl · 2026-04-05T22:33:19Z

shayanjl
Apr 5, 2026

I agree this is mainly a discoverability problem.

There is already some server documentation in the repo, but it is buried in mlx_lm/SERVER.md instead of being surfaced clearly from the main README or a proper docs site. Because of that, it is much harder to find through search, AI tools, or normal browsing. That file already covers the purpose of mlx_lm.server, how to start it, the OpenAI-style /v1/chat/completions endpoint, request/response fields, the /v1/models endpoint, and it also notes that the server is not recommended for production because it only has basic security checks.

So the issue is not really that documentation does not exist. It is that the documentation is hard to discover, fragmented, and not presented like product documentation. For a project this visible, that creates unnecessary friction for users trying to evaluate or adopt mlx_lm.server.

A practical improvement path could be:

Link mlx_lm/SERVER.md prominently from the main README.
Add a dedicated mlx_lm.server docs section with:
quick start
CLI flags / options
API reference
example curl requests
Python and JS OpenAI SDK examples
model loading and local path behavior
limitations and production-readiness notes
troubleshooting for common errors
Publish the server docs in a searchable docs site so they are easier to index and find.
Add a short roadmap or status page explaining where mlx_lm.server currently stands compared with more mature serving tools.

Even a lightweight first step, like linking SERVER.md clearly from the root README, would already help a lot. Right now it is easy for users to assume there is no real server documentation, even though some useful material is already there.

0 replies

guruswami-ai · 2026-04-18T02:42:10Z

guruswami-ai
Apr 18, 2026

I have tried to start documenting mlx_lm.server a bit better and explaining some of the core concepts for the benchmarks at https://github.com/guruswami-ai/mlx-benchmarks/blob/main/docs/. It is a work in progress but a lot of the documentation is relevant, especially for distributed scenarios. The learning curve is steep and this was the documentation I wish I had when I started out.

If you want to explore what is possible with distributed mlx_lm.server I've created an interactive simulation so you can see what is possible with up to five nodes of M3 Ultra running in full RDMA TB5 mesh at https://chakra.guruswami.ai

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mlx_lm.server documentation lacking #1005

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

mlx_lm.server documentation lacking #1005

Uh oh!

rosmur Mar 15, 2026

Replies: 2 comments

Uh oh!

Uh oh!

shayanjl Apr 5, 2026

Uh oh!

guruswami-ai Apr 18, 2026

rosmur
Mar 15, 2026

shayanjl
Apr 5, 2026

guruswami-ai
Apr 18, 2026