- Rust 1.85+ (
edition = "2024"requires a recent stable). - Apple Silicon Mac for actually running the binary (Metal-only).
- Python 3.10+ with
mlx-lminstalled for runtime testing.
inferenced/
├── Cargo.toml
├── README.md
├── LICENSE
├── docs/
│ ├── architecture.md
│ ├── installation.md
│ ├── configuration.md
│ ├── api.md
│ ├── metrics.md
│ ├── development.md # ← you are here
│ └── troubleshooting.md
├── examples/
│ ├── launchd/
│ │ └── dev.dormlab.inferenced.plist
│ └── kubernetes/
│ ├── service.yaml
│ └── endpointslice.yaml
└── src/
├── main.rs # axum router, args, app state
├── supervisor.rs # mlx_lm.server lifecycle
└── admin.rs # /admin/* routes
cargo build --release --target aarch64-apple-darwin
cargo testRUST_LOG=inferenced=debug \
cargo run -- \
--bind 127.0.0.1:11434 \
--allow-cidrs 0.0.0.0/0,::/0 \
--model mlx-community/Qwen2.5-1.5B-Instruct-4bitThe 1.5B model loads fastest (~700 MB) and is best for quick iteration on the bridge code.
anyhow::Errorfor top-level main,thiserror::Errorfor module-level error types. Don't propagateResult<…, anyhow::Error>from library functions.- Single-binary deployable. No external dependencies allowed beyond
mlx_lm.server(which we shell out to) and the standard macOS CLI tools we already use (sysctl,vm_stat). - No
unsafeoutside theextern "C"blocks forkill(2). #[serde(rename_all = "snake_case")]on admin response types (matches Python convention sincemlx_lm.serveris Python).
| Want to add… | Where it goes |
|---|---|
| A new admin route | src/admin.rs — register in router(). |
| A new metric | src/main.rs::AppState::request_counter style; register on the Registry in main(). |
| Multi-model support | Refactor src/supervisor.rs to manage a HashMap<String, Child> keyed by model id. Update the /v1/* proxy to route by the request's model field. |
Auth on /admin/* |
Add a tower middleware on the admin sub-router that validates a bearer token from Authorization. |
| Direct MLX (no Python) | Replace supervisor.rs with a mlx-rs integration. Big change. |
PRs welcome. Keep changes focused; if you're touching the supervisor or the proxy, please add a test. By contributing you agree to license your work under the project's MIT license.