Command-line interface for Onde Inference.
Swift · Flutter · React Native · Rust · Website
Manage your Onde Inference account, fine-tune local models, and export them to GGUF, all from the terminal.
Install onde-cli with your favorite tool. For package docs and the full install matrix, see https://ondeinference.com/cli.
npm install -g @ondeinference/clibrew tap ondeinference/homebrew-tap
brew install ondepip install onde-cli
# or
uv tool install onde-cli
uv run onde
# or with
uvx --from onde-cli ondedotnet tool install --global Onde.Clidart pub global activate onde_cliDownload a release from GitHub Releases:
# macOS Apple Silicon
curl -Lo onde https://github.com/ondeinference/onde-cli/releases/latest/download/onde-macos-arm64
chmod +x onde && mv onde /usr/local/bin/onde| Platform | File |
|---|---|
| macOS Apple Silicon | onde-macos-arm64 |
| macOS Intel | onde-macos-amd64 |
| Linux x64 | onde-linux-amd64 |
| Linux arm64 | onde-linux-arm64 |
| Windows x64 | onde-win-amd64.exe |
| Windows arm64 | onde-win-arm64.exe |
ondeThis opens the TUI. You can sign up or sign in right there.
| Key | What it does |
|---|---|
Tab |
Move between fields |
Enter |
Submit or sign out |
Ctrl+L |
Go to the sign-in screen |
Ctrl+N |
Go to the new account screen |
Ctrl+C |
Quit |
onde includes a LoRA fine-tuning pipeline for Qwen2, Qwen2.5, and Qwen3 models. It runs locally: Metal on Apple Silicon, CPU elsewhere. No cloud setup. No Python environment.
The flow is straightforward: download a safetensors base model, fine-tune it with LoRA, merge the adapter back into the base weights, then export to GGUF for use in the Onde SDK.
If you want a quick refresher on what the model is actually doing at inference time, Onde has a short note on the forward pass.
Each line should be one complete conversation in Qwen's chat template:
{"text": "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is LoRA?<|im_end|>\n<|im_start|>assistant\nLoRA adds small trainable matrices to frozen layers, letting you fine-tune large models without updating all the weights.<|im_end|>"}Save the file wherever you want. The TUI lets you point to it directly.
onde
→ Models tab (Tab from Apps)
→ Select a safetensors model (↑↓, Enter)
→ Press f
Only safetensors models can be fine-tuned. GGUF models are already quantized, so their weights are not differentiable.
Configure the run:
| Field | Default | Notes |
|---|---|---|
| Training data | ~/.onde/finetune/train.jsonl |
Path to your JSONL file |
| LoRA rank | 8 |
Higher means more capacity and more memory use |
| Epochs | 3 |
Full passes over the dataset |
| Learning rate | 0.0001 |
AdamW default |
Press Enter to start. In a healthy run, loss usually starts dropping by epoch 2. If it stays flat, try 0.0003.
For rank 8 on a 0.6B model, the adapter is about 1.5 MB. From the fine-tune complete screen:
mto merge the adapter into the base modelgto export the merged model to GGUF
The resulting GGUF loads directly in the Onde SDK for on-device AI inference.
| Model | Size | Notes |
|---|---|---|
Qwen/Qwen3-0.6B |
~1.2 GB | Smallest and quickest to train |
Qwen/Qwen2.5-1.5B-Instruct |
~3.0 GB | Good default for instruction tuning |
Qwen/Qwen3-1.7B |
~3.4 GB | Newer small Qwen3 model |
Qwen/Qwen3-4B |
~8.0 GB | Best quality, better suited to macOS |
You can search for any of these from the Models tab with /.
Logs are written to ~/.cache/onde/debug.log.
Dual-licensed under MIT and Apache 2.0.
© 2026 Onde Inference (Splitfire AB).