Py-Qwen3-ASR-cpp

Python bindings for for the Qwen3-ASR and Forced Aligner as implemented by qwen3-asr.cpp. Powered by a high-performance C++ backend. This library provides a seamless way to transcribe audio and align text with word-level timestamps using GGUF models.

🚀 Features

High-Level Wrapper: Simple, pythonic API for transcription and forced alignment.
Automatic Audio Handling: Built-in support for WAV files and automatic conversion of other formats (MP3, FLAC, etc.) via ffmpeg.
NumPy Integration: Pass audio data directly as np.float32 arrays.
GGUF Support: Efficient model loading and inference.
Word-Level Timestamps: Precise alignment of text to audio for subtitling or analysis.

📦 Installation

pip install py-qwen3-asr-cpp

Note: For non-WAV audio files, ensure ffmpeg is installed and available in your system PATH.

2. Usage Examples

🛠 Usage

1. Basic Transcription

Transcribe an audio file into text with just a few lines of code.

from py_qwen3_asr_cpp.model import Qwen3ASRModel

# Initialize the model (it handles downloading if a repo ID is provided)
model = Qwen3ASRModel(
    asr_model="qwen3-asr-0.6b-q8-0",
    n_threads=4
)

# Transcribe from file
result = model.transcribe("audio.mp3")
print(f"Detected language: {result.language}")
print(f"Transcription: {result.text}")

2. Forced Alignment

Align a known text transcript to an audio file to obtain word-level timestamps.

model = Qwen3ASRModel(
    asr_model="qwen3-asr-0.6b-q8-0",
    align_model="qwen3-forced-aligner-0.6b-q8-0"
)

# Text to align with the audio
text = "The quick brown fox jumps over the lazy dog"

alignment = model.align("audio.wav", text=text)

for word in alignment.words:
    print(f"Word: {word.word:12} | Start: {word.start:0.2f}ms | End: {word.end:0.2f}ms")

3. Pipeline and Configuration

3. Combined Pipeline

Transcribe and immediately align to get the best of both worlds in one call.

asr_res, align_res = model.transcribe_and_align("interview.wav")

print(f"Full text: {asr_res.text}")
print(f"Total words aligned: {len(align_res.words)}")

⚙️ Configuration

The Qwen3ASRModel accepts several parameters to fine-tune performance:

Parameter	Type	Description
`asr_model`	`str`	Path to ASR GGUF model or HuggingFace ID.
`align_model`	`str`	Path to Aligner GGUF model (optional).
`n_threads`	`int`	Number of CPU threads to use (default: 4).
`language`	`str`	Force a specific language (e.g., "en", "zh").
`max_tokens`	`int`	Maximum tokens for the decoder.
`print_timing`	`bool`	Whether to print inference timing to stdout.

📝 License

This project is licensed under the Apache License 2.0.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
py_qwen3_asr_cpp		py_qwen3_asr_cpp
pybind11 @ 678b673		pybind11 @ 678b673
qwen3-asr.cpp @ 7329578		qwen3-asr.cpp @ 7329578
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Py-Qwen3-ASR-cpp

🚀 Features

📦 Installation

2. Usage Examples

🛠 Usage

1. Basic Transcription

2. Forced Alignment

3. Pipeline and Configuration

3. Combined Pipeline

⚙️ Configuration

📝 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Py-Qwen3-ASR-cpp

🚀 Features

📦 Installation

2. Usage Examples

🛠 Usage

1. Basic Transcription

2. Forced Alignment

3. Pipeline and Configuration

3. Combined Pipeline

⚙️ Configuration

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages