Skip to content

ondeinference/onde-cli

Repository files navigation

Onde Inference

Onde Inference CLI

Command-line interface for Onde Inference.

Website App Store npm PyPI pub.dev NuGet Crates.io

Swift · Flutter · React Native · Rust · Website


Manage your Onde Inference account, fine-tune local models, and export them to GGUF, all from the terminal.

Install

Install onde-cli with your favorite tool. For package docs and the full install matrix, see https://ondeinference.com/cli.

npm

npm install -g @ondeinference/cli

Homebrew

brew tap ondeinference/homebrew-tap
brew install onde

pip / uv / uvx

pip install onde-cli
# or
uv tool install onde-cli
uv run onde
# or with
uvx --from onde-cli onde

.NET tool

dotnet tool install --global Onde.Cli

Dart pub global

dart pub global activate onde_cli

Pre-built binary

Download a release from GitHub Releases:

# macOS Apple Silicon
curl -Lo onde https://github.com/ondeinference/onde-cli/releases/latest/download/onde-macos-arm64
chmod +x onde && mv onde /usr/local/bin/onde
Platform File
macOS Apple Silicon onde-macos-arm64
macOS Intel onde-macos-amd64
Linux x64 onde-linux-amd64
Linux arm64 onde-linux-arm64
Windows x64 onde-win-amd64.exe
Windows arm64 onde-win-arm64.exe

Usage

onde

This opens the TUI. You can sign up or sign in right there.

Key What it does
Tab Move between fields
Enter Submit or sign out
Ctrl+L Go to the sign-in screen
Ctrl+N Go to the new account screen
Ctrl+C Quit

Fine-tuning

onde includes a LoRA fine-tuning pipeline for Qwen2, Qwen2.5, and Qwen3 models. It runs locally: Metal on Apple Silicon, CPU elsewhere. No cloud setup. No Python environment.

The flow is straightforward: download a safetensors base model, fine-tune it with LoRA, merge the adapter back into the base weights, then export to GGUF for use in the Onde SDK.

If you want a quick refresher on what the model is actually doing at inference time, Onde has a short note on the forward pass.

Training data format

Each line should be one complete conversation in Qwen's chat template:

{"text": "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is LoRA?<|im_end|>\n<|im_start|>assistant\nLoRA adds small trainable matrices to frozen layers, letting you fine-tune large models without updating all the weights.<|im_end|>"}

Save the file wherever you want. The TUI lets you point to it directly.

Running it

onde
  → Models tab (Tab from Apps)
  → Select a safetensors model (↑↓, Enter)
  → Press f

Only safetensors models can be fine-tuned. GGUF models are already quantized, so their weights are not differentiable.

Configure the run:

Field Default Notes
Training data ~/.onde/finetune/train.jsonl Path to your JSONL file
LoRA rank 8 Higher means more capacity and more memory use
Epochs 3 Full passes over the dataset
Learning rate 0.0001 AdamW default

Press Enter to start. In a healthy run, loss usually starts dropping by epoch 2. If it stays flat, try 0.0003.

After training

For rank 8 on a 0.6B model, the adapter is about 1.5 MB. From the fine-tune complete screen:

  • m to merge the adapter into the base model
  • g to export the merged model to GGUF

The resulting GGUF loads directly in the Onde SDK for on-device AI inference.

Supported base models

Model Size Notes
Qwen/Qwen3-0.6B ~1.2 GB Smallest and quickest to train
Qwen/Qwen2.5-1.5B-Instruct ~3.0 GB Good default for instruction tuning
Qwen/Qwen3-1.7B ~3.4 GB Newer small Qwen3 model
Qwen/Qwen3-4B ~8.0 GB Best quality, better suited to macOS

You can search for any of these from the Models tab with /.


Debug

Logs are written to ~/.cache/onde/debug.log.


License

Dual-licensed under MIT and Apache 2.0.

Copyright

© 2026 Onde Inference (Splitfire AB).