qwen3-speech is a library-first speech runtime for Qwen3-TTS and Qwen3-ASR. The core product is the embeddable library and stable C ABI; CLI tools, tests, and future engine bindings are adapters over that surface.
As of April 3, 2026, the repository has the intended library structure, public headers, C ABI, CLI adapters, and stage-level fallback implementations for TTS, ASR, audio tokenization, and streaming event surfaces.
ASR has a complete real GGUF inference path under src/asr/ref_impl/ that loads GGUF weights and executes GGML computation graphs. When the manifest points to a valid GGUF artifact, real inference runs; otherwise the pipeline falls back to deterministic stubs for testing.
TTS does not yet have a real GGUF inference path — transformer generation, vocoder decode, speaker encoding, and audio tokenization are deterministic fallback stubs. Real GGUF TTS inference has been validated with the upstream reference project, confirming the integration path is feasible.
- Manifest-first model loading and capability flags in
include/qwen3_speech/. - Stage-oriented TTS modules under
src/tts/. - Stage-oriented ASR modules under
src/asr/. - Real GGUF inference for ASR under
src/asr/ref_impl/(GGUF loading, audio encoder, text decoder, streaming). - Runtime backend selection and GGML backend registration under
src/runtime/. - Stable C ABI entry points under
src/c_api/. - Thin CLI adapters under
cli/. - Unit and integration tests for manifests, common helpers, mel spectrograms, fallback stages, C API smoke, and TTS→ASR roundtrip data flow.
- In-tree real GGUF tensor loading and execution for Qwen3-TTS (ASR is implemented).
- Checked-in model manifests or bundled converted model artifacts.
- Full reference, integration, ABI, and performance suites described by the long-term architecture.
- Godot binding implementation beyond the directory scaffold.
cmake -S . -B build -DQWEN3_SPEECH_BUILD_TESTS=ON -DQWEN3_SPEECH_BUILD_CLI=ON
cmake --build build
ctest --test-dir build --output-on-failureThe top-level build vendors ggml/ from the repo by default. Backend and feature toggles are exposed as CMake options; see docs/build-and-test.md.
include/qwen3_speech/ Public C and thin C++ wrapper headers
src/common/ Shared runtime utilities
src/runtime/ Backend registry, selection, and backend adapters
src/tts/ TTS stage modules and orchestration
src/asr/ ASR stage modules, orchestration, and real GGUF inference
src/asr/ref_impl/ Real GGUF inference path for ASR
src/c_api/ Stable C ABI implementation
cli/ CLI adapters over the library
tests/ Unit and smoke tests
docs/ Project documentation
docs/architecture.md: runtime structure, object lifetimes, and stage boundariesdocs/build-and-test.md: build flags, test targets, and CLI usagedocs/model-manifests.md: manifest schema, capabilities, and examplesdocs/real-inference-status.md: validated GGUF workflow and current integration gap
ASR has real in-tree GGUF inference. TTS is still deterministic fallback stubs — the API shape and stage boundaries are in place, but no GGUF loader or graph execution exists for TTS yet. See docs/real-inference-status.md for the full status, validation history, and reproduction steps.