Skip to content

Latest commit

 

History

History
82 lines (67 loc) · 2.7 KB

File metadata and controls

82 lines (67 loc) · 2.7 KB

Development

Toolchain

  • Rust 1.85+ (edition = "2024" requires a recent stable).
  • Apple Silicon Mac for actually running the binary (Metal-only).
  • Python 3.10+ with mlx-lm installed for runtime testing.

Layout

inferenced/
├── Cargo.toml
├── README.md
├── LICENSE
├── docs/
│   ├── architecture.md
│   ├── installation.md
│   ├── configuration.md
│   ├── api.md
│   ├── metrics.md
│   ├── development.md          # ← you are here
│   └── troubleshooting.md
├── examples/
│   ├── launchd/
│   │   └── dev.dormlab.inferenced.plist
│   └── kubernetes/
│       ├── service.yaml
│       └── endpointslice.yaml
└── src/
    ├── main.rs                 # axum router, args, app state
    ├── supervisor.rs           # mlx_lm.server lifecycle
    └── admin.rs                # /admin/* routes

Build + test

cargo build --release --target aarch64-apple-darwin
cargo test

Run interactively for development

RUST_LOG=inferenced=debug \
  cargo run -- \
    --bind 127.0.0.1:11434 \
    --allow-cidrs 0.0.0.0/0,::/0 \
    --model mlx-community/Qwen2.5-1.5B-Instruct-4bit

The 1.5B model loads fastest (~700 MB) and is best for quick iteration on the bridge code.

Project conventions

  • anyhow::Error for top-level main, thiserror::Error for module-level error types. Don't propagate Result<…, anyhow::Error> from library functions.
  • Single-binary deployable. No external dependencies allowed beyond mlx_lm.server (which we shell out to) and the standard macOS CLI tools we already use (sysctl, vm_stat).
  • No unsafe outside the extern "C" blocks for kill(2).
  • #[serde(rename_all = "snake_case")] on admin response types (matches Python convention since mlx_lm.server is Python).

Where to extend

Want to add… Where it goes
A new admin route src/admin.rs — register in router().
A new metric src/main.rs::AppState::request_counter style; register on the Registry in main().
Multi-model support Refactor src/supervisor.rs to manage a HashMap<String, Child> keyed by model id. Update the /v1/* proxy to route by the request's model field.
Auth on /admin/* Add a tower middleware on the admin sub-router that validates a bearer token from Authorization.
Direct MLX (no Python) Replace supervisor.rs with a mlx-rs integration. Big change.

Contributing

PRs welcome. Keep changes focused; if you're touching the supervisor or the proxy, please add a test. By contributing you agree to license your work under the project's MIT license.