Add TTSKit with Qwen3-TTS support by ZachNagengast · Pull Request #425 · argmaxinc/WhisperKit

ZachNagengast · 2026-02-19T11:27:59Z

WhisperKit is expanding into text-to-speech!

TTSKit adds a new library for on-device text-to-speech using Core ML-accelerated Qwen3-TTS models (CustomVoice 0.6B and 1.7B in this first release) with real-time streaming playback on Apple Silicon. In this first PR, we're introducing the library into the WhisperKit package (WhisperKit will be renamed to reflect the new multi-Kit nature of Argmax Open-source SDK) as an optional import to add real-time TTS capabilities with a state-of-the-art open-source model, either on its own or as a complement to WhisperKit speech-to-text.

This PR is still in the final phases of development, but here are a few highlights:

TTSKit Library

Download, load, generate, and stream playback in ~3 lines of code.
Protocol-based component architecture (6 swappable Core ML components: TextProjecting, CodeEmbedding, MultiCodeEmbedding, CodeDecoding, MultiCodeDecoding, SpeechDecoding) for plugging in new model backends.
Qwen3-TTS implementation with 9 built-in voices, 10 languages, and style instruction support (1.7b variant only).
Automatic text chunking for long-form generations with concurrent chunk generation and cross-fade stitching.
Adaptive streaming playback (TTSPlaybackStrategy.auto) that measures first-step latency to pre-buffer just enough audio.
Seedable RNG for reproducible generation.
WAV and M4A (AAC) audio export

Example usage playing audio in real-time out of the default speaker:

    let ttsKit = try await TTSKit()
    try await ttsKit.playSpeech(text: "Hello from TTSKit!")

New target: ArgmaxCore

Extracted a shared target with various utilities from WhisperKit so TTSKit can share them without depending on it directly

CLI

For now we plan to deploy this as a new command on whisperkit-cli tts that can be used like this:
- swift run whisperkit-cli tts --text "Hello from TTSKit" --play
- Full control over speaker, language, model, style instruction, temperature, chunking, compute units, and seed.

TTSKit Example app

macOS and iOS example app with model management, real-time waveform visualization, generation history persisted as M4A files, and more. Use this as a quick way to try it out!

Roadmap

We plan to continue to add support for state-of-the-art models and improve inference latency for TTSKit over the next few weeks. The immediate follow-up is the voice cloning feature from Qwen3-TTS and a 2x reduction in time-to-first-byte (TTFB) so this on-device project achieves a consistent sub-100 ms, providing a latency edge over cloud deployments of the same model. In the meantime, we encourage anyone reading this to check out this PR, give it a spin, and let us know how it goes!

chen-argmax · 2026-02-19T22:13:35Z

Examples/TTS/SpeakAX/SpeakAX/ViewModel.swift

+
+@MainActor
+@Observable
+final class ViewModel: @unchecked Sendable {


I would break this down to smaller viewmodels if it goes too long, e.g DownloadViewModel vs. TTSViewModel

chen-argmax · 2026-02-19T22:17:16Z

...ples/WhisperAX/WhisperAX.xcodeproj/project.xcworkspace/xcshareddata/swiftpm/Package.resolved

@@ -1,5 +1,5 @@
 {


why do we need to change this?

Had an old swift-transformers resolved

chen-argmax · 2026-02-19T22:21:36Z

Sources/ArgmaxCore/ConcurrencyUtilities.swift

+/// Thin wrapper around `os_unfair_lock` that exposes a Swift-friendly
+/// `withLock` helper. This lock is non-reentrant and optimized for low
+/// contention, matching the semantics of Core Foundation's unfair lock.
+public final class UnfairLock: @unchecked Sendable {


I think we want to make this class name generic for future proof with swift6, seems os_unfair_lock is not the recommended way to lock in swift 6.
probably rename it Mutext so we can reimp it with actual Swift.Mutext later
now

public final class Mutex: @unchecked Sendable { private let lock = OSAllocatedUnfairLock() public init() {} @inlinable public func withLock<T>(_ body: () throws -> T) rethrows -> T { try lock.withLock(body) } } later

public final class Mutex: Sendable {
private let mutex: Swift.Mutex

public init(_ value: Value) { self.mutex = Mutex(value) } public func withLock<T>(_ body: (inout Value) throws -> T) rethrows -> T { try mutex.withLock(body) }

}

Sources/ArgmaxCore/ConcurrencyUtilities.swift

chen-argmax · 2026-02-19T22:24:20Z

Sources/ArgmaxCore/MLModelExtensions.swift

should we consider adding another package under ArgmaxCore? like ArgmaxCore/CoreML

chen-argmax · 2026-02-19T22:36:10Z

Sources/TTSKit/TTSKit.swift

+    ///
+    /// Downloads only the files matching the configured component variants.
+    /// Files are cached locally by the Hub library.
+    open class func download(


should we decouple model download from TTSKit? ArgmaxCore could provide a downloader for this

Yep have some todos relating to this

chen-argmax · 2026-02-19T22:36:56Z

Sources/TTSKit/TTSModels.swift

+//  Copyright © 2026 Argmax, Inc. All rights reserved.
+
+import Accelerate
+@_exported import ArgmaxCore


why @_exported?

chen-argmax · 2026-02-19T22:38:55Z

Tests/TTSKitTests/TTSKitIntegrationTests.swift

+        )
+
+        XCTAssertGreaterThan(result.audio.count, 0, "Audio samples should be non-empty")
+        XCTAssertGreaterThan(result.audioDuration, 1.0, "Expect at least 1s of speech")


will seed guarantee the audio length is always deterministic?

Yup, apple docs recommend using this method https://developer.apple.com/documentation/swift/randomnumbergenerator#Conforming-to-the-RandomNumberGenerator-Protocol

chen-argmax · 2026-02-19T22:44:25Z

Tests/TTSKitTests/TTSKitUnitTests.swift

+//  For licensing see accompanying LICENSE.md file.
+//  Copyright © 2024 Argmax, Inc. All rights reserved.
+
+import ArgmaxCore


I think we would want to break these test down to isolated class test.

e.g1 TTSKitTest.swift that injects a Config with mocked components, and verify
TTSKitTest.generateSpeech interacts with the components correctly, tasks created etc.

e.g2 Qwen3TTSGenerateTaskTest.swfit that inejcts mocked components. verify run interacts with them correctly

chen-argmax · 2026-02-19T22:45:35Z

Sources/TTSKit/Qwen3TTS/Qwen3TTSGenerateTask.swift

+/// owns its own sampler (derived seed) so concurrent tasks don't share RNG state.
+/// Model components are shared read-only references - `MLModel.prediction()` is
+/// thread-safe. The class is `@unchecked Sendable` to permit `open` subclassing.
+open class TTSGenerateTask: @unchecked Sendable, TTSGenerating {


Should the class be renamed to Qwen3TTSGenerateTask ? ditto to other files under Qwen3TTS

naykutguven · 2026-02-25T07:52:54Z

Sources/ArgmaxCore/ConcurrencyUtilities.swift

+/// Serializes access to a value with an `os_unfair_lock` so mutation stays
+/// thread-safe. Useful for properties on types marked `@unchecked Sendable`.
+@propertyWrapper
+public struct PropertyLock<Value: Codable & Sendable>: Sendable, Codable {


TLDR; @ZachNagengast @chen-argmax guys, this doesn't make reference or value type properties truly thread safe.

I was playing around with this and trying to move Sendable and Codable conformances outside. Did some verifications on the current implementation and mine. Ran the snippet below with different variations

Reference type property Ref

Value type property Ref

Plain property of type Int

None of them was safe. Locking accessors isn't enough. We need to wrap mutations with locks

final class Ref: Codable, @unchecked Sendable { var count: Int init(count: Int = 0) { self.count = count } enum CodingKeys: String, CodingKey { case count } required init(from decoder: Decoder) throws { let c = try decoder.container(keyedBy: CodingKeys.self) self.count = try c.decode(Int.self, forKey: .count) } func encode(to encoder: Encoder) throws { var c = encoder.container(keyedBy: CodingKeys.self) try c.encode(count, forKey: .count) } } final class Holder: @unchecked Sendable { @TranscriptionPropertyLock var ref = Ref() } @main struct Main { static func main() async { let workers = max(2, ProcessInfo.processInfo.activeProcessorCount * 2) let perWorker = 50_000 let expected = workers * perWorker print("workers=\(workers), perWorker=\(perWorker), expected=\(expected)") for run in 1...10 { let holder = Holder() await withTaskGroup(of: Void.self) { group in for _ in 0..<workers { group.addTask { for _ in 0..<perWorker { holder.ref.count += 1 } } } } let final = holder.ref.count print("run \(run): expected=\(expected) actual=\(final)") } } }

The approach used in AudioProcessor PR (also the WhisperKit PR) works. Snippet below:

import Foundation import os.lock @usableFromInline final class UnfairLock: @unchecked Sendable { @usableFromInline var lock = os_unfair_lock() @inlinable func withLock<T>(_ body: () throws -> T) rethrows -> T { os_unfair_lock_lock(&lock) defer { os_unfair_lock_unlock(&lock) } return try body() } } final class Ref { var count = 0 } final class HolderInt: @unchecked Sendable { private let stateLock = UnfairLock() private var countStorage = 0 var count: Int { get { stateLock.withLock { countStorage } } set { stateLock.withLock { countStorage = newValue } } } func increment() { stateLock.withLock { countStorage += 1 } } } final class HolderRef: @unchecked Sendable { private let stateLock = UnfairLock() private let refStorage = Ref() var refCount: Int { stateLock.withLock { refStorage.count } } func incrementRef() { stateLock.withLock { refStorage.count += 1 } } } @main struct Main { static func main() async { let workers = max(2, ProcessInfo.processInfo.activeProcessorCount * 2) let perWorker = 50_000 let expected = workers * perWorker print("[Int] workers=\(workers), perWorker=\(perWorker), expected=\(expected)") for run in 1...10 { let holder = HolderInt() await withTaskGroup(of: Void.self) { group in for _ in 0..<workers { group.addTask { for _ in 0..<perWorker { holder.increment() } } } } let final = holder.count print("[Int] run \(run): expected=\(expected) actual=\(final)") } print("[Ref] workers=\(workers), perWorker=\(perWorker), expected=\(expected)") for run in 1...10 { let holder = HolderRef() await withTaskGroup(of: Void.self) { group in for _ in 0..<workers { group.addTask { for _ in 0..<perWorker { holder.incrementRef() } } } } let final = holder.refCount print("[Ref] run \(run): expected=\(expected) actual=\(final)") } } }

This is a valid concern, essentially if the property wrapped property has another property, read/write wont' be thread safe.

e.g this is thread safe

holder.ref = otherRef

this is not thread safe

holder.ref.count += 1

@ZachNagengast we may want to add document for this wrapper.

I think it isn't safe to use it for pure value type properties e.g. Int either. we probably need to use _modify instead of set.

I am checking these resources:

https://www.linkedin.com/pulse/migrating-swift-6-dont-use-unchecked-sendable-until-you-kolyadin-mflee/

https://github.com/artemkolyadin/swift-threadsafe-macros

https://www.youtube.com/watch?v=035WscXr7Xo&t=937s (Russian so I used auto translated CC) cc: @ZachNagengast

Making a note in this PR but will leave the fix to a followup 👍

zaidbren · 2026-02-26T13:03:49Z

I am trying to run the 1.7B model on macbook air m1, and although the 0.6B version worked fine, in the 1.7B, It first specialize the model for the device, than loading and when it was generating, it stopped and throws this error :- Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model.

chen-argmax

approved with a comment to add doc toPropertyLock

Add TTSKit with Qwen3-TTS support

ba8475c

ZachNagengast requested review from a2they, atiorh and chen-argmax February 19, 2026 11:27

chen-argmax requested changes Feb 19, 2026

View reviewed changes

argmaxinc deleted a comment from chen-argmax Feb 19, 2026

naykutguven reviewed Feb 25, 2026

View reviewed changes

Rework architecture

3743722

ZachNagengast requested a review from chen-argmax February 27, 2026 04:40

ZachNagengast added 2 commits February 26, 2026 20:42

Clear dev team

477ab56

Use Debug.xcconfig for examples

5ca9942

chen-argmax approved these changes Feb 27, 2026

View reviewed changes

ZachNagengast added 2 commits February 27, 2026 15:49

Cleanup docc comments and add future test for propertylock

beda811

Add todo comments and update docs

2397168

Conversation

ZachNagengast commented Feb 19, 2026 • edited by atiorh Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TTSKit Library

New target: ArgmaxCore

CLI

TTSKit Example app

Roadmap

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zaidbren commented Feb 26, 2026

Uh oh!

chen-argmax left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZachNagengast commented Feb 19, 2026 •

edited by atiorh

Loading