A native macOS OCR pipeline in Swift that runs the Hugging Face zai-org/GLM-OCR model locally with MLX Swift.
This repo ships as:
GLMOCRCLI: a command-line OCR tool for images and PDFsGLMOCRApp: a small SwiftUI drag-and-drop app (one file at a time)
Optionally, layout mode runs PaddlePaddle/PP-DocLayoutV3_safetensors first to detect regions and produce region-ordered Markdown plus JSON exports.
- macOS 14+
- Swift 6 toolchain (Xcode 16+ recommended)
- Apple Silicon recommended for performance
For the fast development loop, use SwiftPM:
swift test
scripts/build_mlx_metallib.sh -c debug
swift run GLMOCRCLI --help
swift run GLMOCRAppOn a clean checkout, running SwiftPM-built executables (swift run ...) can fail if mlx.metallib is missing. scripts/build_mlx_metallib.sh prepares that SwiftPM metallib.
For production/release builds (the packaged CLI path), use the Xcode/xcodebuild wrapper:
scripts/build.shscripts/build.sh will ensure the Metal toolchain is available and will attempt to install it when missing via xcodebuild -downloadComponent MetalToolchain.
# Show the live CLI contract (source of truth for flags and defaults)
scripts/build.sh
CLI=".build/xcode/Build/Products/Release/GLMOCRCLI"
"$CLI" --help
# Optional: prefetch the default model snapshots
"$CLI" --download-only
# Image -> Markdown
"$CLI" --input examples/source/page.png > out.md
# PDF -> Markdown (all pages by default)
"$CLI" --input examples/source/GLM-4.5V_Pages_1_2_3.pdf > out.md
# Restrict PDF pages explicitly
"$CLI" --input examples/source/GLM-4.5V_Pages_1_2_3.pdf --pages 1-2 > out.mdNon-layout OCR supports task presets via --task: text, formula, table, and json.
Layout mode is enabled by default for PDFs and disabled for non-PDF inputs. It is required for JSON exports.
CLI=".build/xcode/Build/Products/Release/GLMOCRCLI"
"$CLI" --layout --input examples/source/table.png \
--emit-json out.blocks.json \
--emit-ocrdocument-json out.ocrdocument.json > out.mdswift run GLMOCRAppDrag and drop one image or PDF, then run OCR.
Try the sample inputs under examples/source/ (PNG + PDF). For the example corpus contract and reproducible evaluation tooling, see examples/README.md.
By default, models are resolved from the local Hugging Face cache and downloaded only when missing. You can control the cache location with --download-base, HF_HUB_CACHE, or HF_HOME.
For a deeper CLI reference (outputs, caching, environment variables), see docs/apis/cli.md.
- Inference runs locally; the only network dependency is downloading model snapshots when they are not already cached.
--emit-json/--emit-ocrdocument-jsonrequire layout mode.- The SwiftUI app is intentionally minimal and does not yet include export UI, queueing, or signed/notarized distribution.
- docs/apis/cli.md — CLI usage, outputs, and model/cache behavior
- docs/development_guide.md — build/test, app, release builds, examples/eval workflows
- docs/architecture.md — module boundaries and runtime dataflow
- docs/golden_checks.md — parity and model-backed verification (opt-in)
- docs/overview.md — documentation index and source-of-truth map
MIT (see LICENSE).