Swift package for running LiteRT-LM models on iOS. Wraps Google's C API in a clean, async/await Swift interface.
Supports text generation, vision (image understanding), audio (speech/sound understanding), and streaming with models like Gemma 4 E2B.
Note: This is a community project, not an official Google product. The included
CLiteRTLM.xcframeworkis built from Google's open-source LiteRT-LM C API (Apache 2.0).
- iOS 17.0+
- Xcode 16+
- iPhone 13 Pro or later (6 GB+ RAM required for Gemma 4 E2B)
increased-memory-limitentitlement (model loading needs ~4 GB RAM)
How to add the increased-memory-limit entitlement
In Xcode: select your app target > Signing & Capabilities > + Capability > search "Increased Memory Limit".
Or add manually to your .entitlements file:
<key>com.apple.developer.kernel.increased-memory-limit</key>
<true/>Without this entitlement the system may kill your app during model loading.
Add to your Package.swift:
dependencies: [
.package(url: "https://github.com/mylovelycodes/LiteRTLM-Swift.git", from: "0.1.0")
],
targets: [
.target(
name: "YourApp",
dependencies: [
.product(name: "LiteRTLMSwift", package: "LiteRTLM-Swift")
]
)
]Or in Xcode: File > Add Package Dependencies > paste the repo URL > add LiteRTLMSwift to your target.
A complete end-to-end example:
import LiteRTLMSwift
// 1. Download model (~2.6 GB, only needed once)
let downloader = ModelDownloader()
try await downloader.download() // defaults to Gemma 4 E2B from HuggingFace
// 2. Load engine
let engine = LiteRTLMEngine(modelPath: downloader.modelPath)
try await engine.load() // takes ~5-10s on first launch
// 3. Generate text
let response = try await engine.generate(
prompt: "<|turn>user\nWhat is Swift?\n<turn|>\n<|turn>model\n",
temperature: 0.7,
maxTokens: 256
)
print(response)
// 4. Vision (image understanding)
let imageData = try Data(contentsOf: photoURL)
let caption = try await engine.vision(
imageData: imageData, // JPEG, PNG, or HEIC
prompt: "Describe this photo.",
maxTokens: 512
)
print(caption)
// 5. Audio (speech/sound understanding)
let audioData = try Data(contentsOf: audioURL)
let transcript = try await engine.audio(
audioData: audioData, // WAV, FLAC, or MP3
prompt: "Transcribe this audio.",
maxTokens: 512
)
print(transcript)Important: Text generation (
generate,generateStreaming,openSession) requires Gemma 4's turn marker format in the prompt (see Prompt Format). All other methods — vision, audio, multimodal, and the persistent conversation API (openConversation/conversationSend) — take plain text prompts. The Conversation API handles formatting internally.
for try await chunk in engine.generateStreaming(
prompt: "<|turn>user\nTell me a story.\n<turn|>\n<|turn>model\n"
) {
print(chunk, terminator: "")
}let answer = try await engine.visionMultiImage(
imagesData: [image1Data, image2Data],
prompt: "Compare these two photos.",
maxTokens: 1024
)Supports WAV, FLAC, and MP3. Audio is automatically resampled to 16 kHz mono internally.
let audioData = try Data(contentsOf: recordingURL)
// Transcription (default format: .wav)
let text = try await engine.audio(
audioData: audioData,
prompt: "Transcribe this audio."
)
// MP3 file
let mp3Data = try Data(contentsOf: mp3URL)
let summary = try await engine.audio(
audioData: mp3Data,
prompt: "Summarize what is being said.",
format: .mp3,
maxTokens: 1024
)Analyze audio and images together in a single query:
let response = try await engine.multimodal(
audioData: [audioTrackData],
imagesData: [keyframeData],
prompt: "Does the speaker's description match what's shown in the image?"
)For multi-turn conversations, use the persistent session API. The KV cache is preserved across turns, reducing time-to-first-token from ~20s to ~1-2s on follow-up messages.
// Open a persistent session
try await engine.openSession(temperature: 0.7, maxTokens: 512)
// First turn — full prefill (~15-20s TTFT)
for try await chunk in engine.sessionGenerateStreaming(
input: "<|turn>user\nHello!\n<turn|>\n<|turn>model\n"
) {
print(chunk, terminator: "")
}
// Second turn — incremental prefill (~1-2s TTFT)
for try await chunk in engine.sessionGenerateStreaming(
input: "<turn|>\n<|turn>user\nTell me more.\n<turn|>\n<|turn>model\n"
) {
print(chunk, terminator: "")
}
// Clean up when done
engine.closeSession()Mix images, audio, and text freely across turns — each turn reuses the KV cache:
// Open a persistent multimodal conversation
try await engine.openConversation(temperature: 0.7)
// Turn 1: send an image
let description = try await engine.conversationSend(
imagesData: [photoData],
prompt: "What's in this photo?"
)
// Turn 2: text-only follow-up (KV cache reused — fast TTFT)
let detail = try await engine.conversationSend(
prompt: "What color is the car in the background?"
)
// Turn 3: send audio
let answer = try await engine.conversationSend(
audioData: [clipData],
audioFormat: .wav,
prompt: "Does this audio match the scene?"
)
// Clean up when done
engine.closeConversation()ModelDownloader is @Observable, so you can bind directly in SwiftUI:
struct DownloadView: View {
@State private var downloader = ModelDownloader()
var body: some View {
switch downloader.status {
case .notStarted:
Button("Download Model (\(downloader.totalBytesDisplay))") {
Task { try await downloader.download() }
}
case .downloading(let progress):
ProgressView(value: progress)
Text("\(downloader.downloadedBytesDisplay) / \(downloader.totalBytesDisplay)")
Button("Pause") { downloader.pause() }
case .paused:
Button("Resume") { Task { try await downloader.download() } }
case .completed:
Text("Model ready!")
case .failed(let msg):
Text("Error: \(msg)")
Button("Retry") { Task { try await downloader.download() } }
}
}
}struct EngineView: View {
@State private var engine: LiteRTLMEngine
init() {
let path = ModelDownloader().modelPath
_engine = State(initialValue: LiteRTLMEngine(modelPath: path))
}
var body: some View {
Group {
switch engine.status {
case .notLoaded:
Button("Load Model") { Task { try await engine.load() } }
case .loading:
ProgressView("Loading model...")
case .ready:
Text("Ready for inference!")
case .error(let msg):
Text("Error: \(msg)")
}
}
}
}| Method | Description |
|---|---|
init(modelPath:backend:) |
Create engine. backend: "cpu" (default, recommended) or "gpu" (experimental, Metal) |
load() |
Load the .litertlm model. Call once, reuse across inferences |
unload() |
Free model memory |
generate(prompt:temperature:maxTokens:) |
One-shot text generation. Prompt must use Gemma turn markers |
generateStreaming(prompt:temperature:maxTokens:) |
Streaming text generation |
vision(imageData:prompt:temperature:maxTokens:maxImageDimension:) |
Single-image understanding. Plain text prompt |
visionMultiImage(imagesData:prompt:temperature:maxTokens:maxImageDimension:) |
Multi-image understanding |
audio(audioData:prompt:format:temperature:maxTokens:) |
Audio understanding (WAV, FLAC, MP3). Plain text prompt |
multimodal(audioData:audioFormat:imagesData:prompt:temperature:maxTokens:maxImageDimension:) |
Combined audio + vision inference |
openSession(temperature:maxTokens:) |
Open persistent text session for multi-turn chat (KV cache reuse) |
sessionGenerateStreaming(input:) |
Stream generation using persistent text session |
closeSession() |
Close persistent text session, free KV cache |
openConversation(temperature:maxTokens:) |
Open persistent multimodal conversation (KV cache reuse) |
conversationSend(audioData:audioFormat:imagesData:prompt:maxImageDimension:) |
Send a turn in the persistent conversation (any mix of audio/images/text) |
closeConversation() |
Close persistent multimodal conversation, free KV cache |
| Property | Type | Description |
|---|---|---|
status |
Status |
.notLoaded, .loading, .ready, or .error(String) |
isReady |
Bool |
Whether the engine is ready for inference |
| Method | Description |
|---|---|
init(modelsDirectory:) |
Create downloader. Default path: ~/Library/Application Support/LiteRTLM/Models/ |
download(from:) |
Download model from URL. Defaults to defaultModelURL (HuggingFace) |
pause() |
Pause download. Resume data is persisted to disk |
cancel() |
Cancel download and discard resume data |
deleteModel() |
Delete the downloaded model file |
| Property | Type | Description |
|---|---|---|
status |
DownloadStatus |
Current download state |
progress |
Double |
0.0 to 1.0 |
isDownloaded |
Bool |
Whether the model file exists on disk |
modelPath |
URL |
Full path to model file (use with LiteRTLMEngine(modelPath:)) |
The Session API (text generation) requires Gemma 4's native turn marker format. The Conversation API (vision, audio, multimodal, persistent conversation) does NOT — just pass plain text.
<|turn>user
Your message here
<turn|>
<|turn>model
With system prompt:
<|turn>system
You are a helpful assistant.
<turn|>
<|turn>user
Hello!
<turn|>
<|turn>model
Multi-turn (for persistent session — only send the NEW content each turn):
# First turn input:
<|turn>user
Hello!
<turn|>
<|turn>model
# Second turn input (note the closing marker from previous model turn):
<turn|>
<|turn>user
Tell me more.
<turn|>
<|turn>model
┌──────────────────────────────────────────────┐
│ Your App │
├──────────────────────────────────────────────┤
│ LiteRTLMSwift │
│ ┌─────────────────┐ ┌──────────────────┐ │
│ │ LiteRTLMEngine │ │ ModelDownloader │ │
│ │ │ │ │ │
│ │ .generate() │ │ .download() │ │
│ │ .vision() │ │ .pause() │ │
│ │ .audio() │ │ .cancel() │ │
│ │ .multimodal() │ │ │ │
│ │ .openSession() │ │ │ │
│ │ .openConversation() │ │
│ └────────┬────────┘ └──────────────────┘ │
│ │ │
│ Serial DispatchQueue │
│ (thread safety) │
├───────────┼──────────────────────────────────┤
│ CLiteRTLM.xcframework (C API) │
│ │ │
│ Session API Conversation API │
│ (text in/out) (multimodal JSON) │
│ │
│ For text generation For vision / audio / │
│ Raw prompt format multimodal inference │
└──────────────────────────────────────────────┘
- Session API — raw text prompts via
InputData. You control the prompt format. Used bygenerate(),generateStreaming(),openSession(). - Conversation API — JSON-based messages with image/audio file paths. Handles image decode/resize/patchify and audio decode/resample/mel-spectrogram internally. Used by
vision(),visionMultiImage(),audio(),multimodal(),openConversation()/conversationSend(). - All C API calls are serialized on a single
DispatchQueuefor thread safety. LiteRT-LM supports only one active session at a time.
This repo ships a prebuilt CLiteRTLM.xcframework. If you want to build it yourself (e.g. to pick up upstream fixes or try the GPU backend), follow the steps below.
| Tool | Version | Install |
|---|---|---|
| Bazel | 7.6.1 | brew install bazelisk (auto-downloads correct version) |
| Xcode | 16+ | Mac App Store |
| Disk space | ~20 GB | Bazel build cache |
# Clones LiteRT-LM source automatically and builds xcframework
./scripts/build-xcframework.sh
# Or point to an existing local checkout
./scripts/build-xcframework.sh ~/Dev/LiteRT-LMThe script will:
- Clone (or use existing) google-ai-edge/LiteRT-LM source
- Patch
c/BUILDif needed — adds thecc_binarydylib target (missing in v0.10.2 and earlier) and stubsios_engine.bzl(missing in HEAD) - Build
libLiteRTLMEngine.dylibforios_arm64(device) andios_sim_arm64(simulator) - Package both into
Frameworks/LiteRTLM.xcframework
git clone https://github.com/google-ai-edge/LiteRT-LM.git
cd LiteRT-LMThe upstream BUILD file may need patching depending on the version:
a) Releases up to v0.10.2 — the cc_binary target for the shared library doesn't exist yet. Append it:
cat >> c/BUILD << 'EOF'
cc_binary(
name = "libLiteRTLMEngine.dylib",
srcs = [
"engine.cc",
"engine.h",
"litert_lm_logging.cc",
"litert_lm_logging.h",
],
linkopts = [
"-Wl,-exported_symbol,_litert_lm_*",
],
linkshared = True,
linkstatic = True,
visibility = ["//visibility:public"],
deps = ENGINE_COMMON_DEPS + [
"//runtime/core:engine_impl",
],
)
EOFb) Latest HEAD — c/BUILD loads ios_engine.bzl which isn't published. Create a stub:
cat > c/ios_engine.bzl << 'EOF'
"""Stub for ios_shared_engine macro (not yet published upstream)."""
def ios_shared_engine(**kwargs):
pass
EOFThe build script (
Option A) detects and applies both patches automatically.
bazel build --config=ios_arm64 //c:libLiteRTLMEngine.dylibOutput: bazel-bin/c/libLiteRTLMEngine.dylib
The Bazel build target is defined in c/BUILD:
linkshared = True+linkstatic = True— produces a self-contained dylib with all C++ deps statically linked-Wl,-exported_symbol,_litert_lm_*— only exports the public C API symbols
# Save device dylib first (Bazel overwrites bazel-bin between configs)
cp bazel-bin/c/libLiteRTLMEngine.dylib /tmp/libLiteRTLMEngine-device.dylib
bazel build --config=ios_sim_arm64 //c:libLiteRTLMEngine.dylib
cp bazel-bin/c/libLiteRTLMEngine.dylib /tmp/libLiteRTLMEngine-sim.dylibAvailable iOS configs in .bazelrc:
| Config | Architecture | Use Case |
|---|---|---|
ios_arm64 |
arm64 | Physical device |
ios_sim_arm64 |
arm64 | Apple Silicon simulator |
ios_x86_64 |
x86_64 | Intel Mac simulator |
ios_arm64e |
arm64e | A12+ with pointer auth |
Each architecture needs to be wrapped in a .framework bundle before creating the xcframework.
# Device framework
mkdir -p /tmp/ios-arm64/CLiteRTLM.framework/{Headers,Modules}
cp /tmp/libLiteRTLMEngine-device.dylib /tmp/ios-arm64/CLiteRTLM.framework/CLiteRTLM
install_name_tool -id "@rpath/CLiteRTLM.framework/CLiteRTLM" /tmp/ios-arm64/CLiteRTLM.framework/CLiteRTLM
# Simulator framework
mkdir -p /tmp/ios-arm64-simulator/CLiteRTLM.framework/{Headers,Modules}
cp /tmp/libLiteRTLMEngine-sim.dylib /tmp/ios-arm64-simulator/CLiteRTLM.framework/CLiteRTLM
install_name_tool -id "@rpath/CLiteRTLM.framework/CLiteRTLM" /tmp/ios-arm64-simulator/CLiteRTLM.framework/CLiteRTLMCopy headers (from the LiteRT-LM source c/ directory):
for DIR in /tmp/ios-arm64 /tmp/ios-arm64-simulator; do
cp c/engine.h "$DIR/CLiteRTLM.framework/Headers/"
cp c/litert_lm_logging.h "$DIR/CLiteRTLM.framework/Headers/"
doneCreate module.modulemap (same for both):
for DIR in /tmp/ios-arm64 /tmp/ios-arm64-simulator; do
cat > "$DIR/CLiteRTLM.framework/Modules/module.modulemap" << 'EOF'
framework module CLiteRTLM {
header "engine.h"
export *
}
EOF
doneCreate Info.plist (same for both):
for DIR in /tmp/ios-arm64 /tmp/ios-arm64-simulator; do
cat > "$DIR/CLiteRTLM.framework/Info.plist" << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CFBundleExecutable</key>
<string>CLiteRTLM</string>
<key>CFBundleIdentifier</key>
<string>com.google.CLiteRTLM</string>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundleName</key>
<string>CLiteRTLM</string>
<key>CFBundlePackageType</key>
<string>FMWK</string>
<key>CFBundleVersion</key>
<string>1.0</string>
<key>MinimumOSVersion</key>
<string>13.0</string>
</dict>
</plist>
EOF
doneAd-hoc code sign:
codesign --force --sign - /tmp/ios-arm64/CLiteRTLM.framework/CLiteRTLM
codesign --force --sign - /tmp/ios-arm64-simulator/CLiteRTLM.framework/CLiteRTLMxcodebuild -create-xcframework \
-framework /tmp/ios-arm64/CLiteRTLM.framework \
-framework /tmp/ios-arm64-simulator/CLiteRTLM.framework \
-output Frameworks/LiteRTLM.xcframework# Check architectures
file Frameworks/LiteRTLM.xcframework/ios-arm64/CLiteRTLM.framework/CLiteRTLM
# -> Mach-O 64-bit dynamically linked shared library arm64
file Frameworks/LiteRTLM.xcframework/ios-arm64-simulator/CLiteRTLM.framework/CLiteRTLM
# -> Mach-O 64-bit dynamically linked shared library arm64 (simulator)
# Check exported symbols
nm -gU Frameworks/LiteRTLM.xcframework/ios-arm64/CLiteRTLM.framework/CLiteRTLM | grep litert_lm
# Should list all litert_lm_* public API functions| Issue | Solution |
|---|---|
no such target '//c:libLiteRTLMEngine.dylib' |
Two possible causes: (1) v0.10.2 and earlier don't define the cc_binary dylib target — append it per Step 2a; (2) HEAD loads a missing ios_engine.bzl which breaks BUILD parsing — create the stub per Step 2b |
no such package '@build_bazel_apple_support' |
Run bazel sync to fetch external dependencies |
| Xcode SDK not found | Ensure Xcode is selected: sudo xcode-select -s /Applications/Xcode.app |
| Build takes very long | First build downloads ~10 GB of deps. Subsequent builds use cache |
Undefined symbols at link time |
Make sure you're using //c:libLiteRTLMEngine.dylib target, not //c:engine |
| Code signing errors | Use ad-hoc signing (--sign -) for development; real signing happens at app archive |
MIT License. See LICENSE.
The CLiteRTLM.xcframework contains code from Google's LiteRT-LM project, licensed under the Apache License 2.0.