You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Option A: Installer (recommended)# Download OpenScribeSetup.exe from Releases → run it# Installs to %LOCALAPPDATA%\Programs\EDGESCRIBE, adds to PATH# VC++ runtime installed automatically if needed# Option B: ZIPExpand-Archive openscribe-win-x64.zip -DestinationPath C:\edgescribe
cd C:\edgescribe\openscribe-win-x64
# If you get "VCRUNTIME140.dll not found" error, run the bundled installer:
.\vc_redist_x64.exe/install /quiet
# Download models and verify
.\edgescribe.exe pull nemotron
.\edgescribe.exe pull qwen3-vl
# Start server with web UI
.\edgescribe.exe serve --port 8080# Open http://localhost:8080 in browser
NVIDIA Jetson boards (Orin Nano, Orin NX, AGX Orin) run Linux ARM64 with
optional CUDA support. Pre-built binaries are not provided — build on the
device or cross-compile.
Prerequisites (on Jetson)
# JetPack should already include CUDA toolkit
nvcc --version # Verify CUDA
cmake --version # Need 3.18+
g++ --version # Need C++20 (GCC 12+)# If CMake is too old:
sudo apt-get install -y cmake
# If GCC is too old (Ubuntu 20.04 on older JetPack):
sudo apt-get install -y gcc-12 g++-12
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 100
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 100
If you don't need GPU acceleration or have a Jetson Nano with limited VRAM:
# Same as above but change step 2:
cmake -B build -DCMAKE_BUILD_TYPE=Release \
-DGGML_CUDA=OFF \
-DLLAMA_BUILD_TESTS=OFF \
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_SERVER=OFF
Jetson Memory Considerations
Jetson Model
RAM
Recommended Model
Notes
Orin Nano (4GB)
4 GB shared
Qwen3-VL Q4_K_S
Tight — skip TTS, reduce n_ctx
Orin Nano (8GB)
8 GB shared
Qwen3-VL Q4_K_M
Full stack works
Orin NX (8GB)
8 GB shared
Qwen3-VL Q4_K_M
Full stack works
Orin NX (16GB)
16 GB shared
Qwen3-VL Q8_0
Higher quality possible
AGX Orin (32/64GB)
32-64 GB
Qwen3-VL Q8_0 or 8B
Can run larger models
Jetson uses unified memory — CPU and GPU share the same RAM. The model
must fit in total available memory minus OS overhead (~1-2 GB).
Running on Jetson with GPU
# Use CUDA for LLM/Vision (offload all layers to GPU)
edgescribe chat "Hello" --device cuda
# Start server with GPU
edgescribe serve --device cuda --port 8080
Raspberry Pi 5 Deployment (ARM64, CPU-only)
Prerequisites
# Raspberry Pi OS (64-bit) with GCC 12+
sudo apt-get update
sudo apt-get install -y cmake g++ git curl libpulse-dev
Build
Same as Jetson CPU-only build, but skip CUDA:
# Follow the Jetson steps above with DGGML_CUDA=OFF# Raspberry Pi 5 has 4-8 GB RAM — use Q4_K_S or Q4_K_M
Pi 5 Performance Expectations
Task
Model
Speed
LLM chat (Q4_K_M)
Qwen3-VL-2B
~15-25 tok/s
Vision (Q4_K_M)
Qwen3-VL-2B
~10-15 tok/s
ASR (Nemotron)
Parakeet 0.6B
Near real-time
TTS (Kokoro/Piper)
—
Real-time
Cross-Compilation (Build on x64, Run on ARM64)
If you prefer to build on a faster x64 machine for ARM64 targets: