Skip to content

dummyx/tst.cpp

Repository files navigation

qwen3-speech

qwen3-speech is a library-first speech runtime for Qwen3-TTS and Qwen3-ASR. The core product is the embeddable library and stable C ABI; CLI tools, tests, and future engine bindings are adapters over that surface.

Current Status

As of April 3, 2026, the repository has the intended library structure, public headers, C ABI, CLI adapters, and stage-level fallback implementations for TTS, ASR, audio tokenization, and streaming event surfaces.

ASR has a complete real GGUF inference path under src/asr/ref_impl/ that loads GGUF weights and executes GGML computation graphs. When the manifest points to a valid GGUF artifact, real inference runs; otherwise the pipeline falls back to deterministic stubs for testing.

TTS does not yet have a real GGUF inference path — transformer generation, vocoder decode, speaker encoding, and audio tokenization are deterministic fallback stubs. Real GGUF TTS inference has been validated with the upstream reference project, confirming the integration path is feasible.

What Exists Today

  • Manifest-first model loading and capability flags in include/qwen3_speech/.
  • Stage-oriented TTS modules under src/tts/.
  • Stage-oriented ASR modules under src/asr/.
  • Real GGUF inference for ASR under src/asr/ref_impl/ (GGUF loading, audio encoder, text decoder, streaming).
  • Runtime backend selection and GGML backend registration under src/runtime/.
  • Stable C ABI entry points under src/c_api/.
  • Thin CLI adapters under cli/.
  • Unit and integration tests for manifests, common helpers, mel spectrograms, fallback stages, C API smoke, and TTS→ASR roundtrip data flow.

What Does Not Exist Yet

  • In-tree real GGUF tensor loading and execution for Qwen3-TTS (ASR is implemented).
  • Checked-in model manifests or bundled converted model artifacts.
  • Full reference, integration, ABI, and performance suites described by the long-term architecture.
  • Godot binding implementation beyond the directory scaffold.

Build

cmake -S . -B build -DQWEN3_SPEECH_BUILD_TESTS=ON -DQWEN3_SPEECH_BUILD_CLI=ON
cmake --build build
ctest --test-dir build --output-on-failure

The top-level build vendors ggml/ from the repo by default. Backend and feature toggles are exposed as CMake options; see docs/build-and-test.md.

Repo Layout

include/qwen3_speech/   Public C and thin C++ wrapper headers
src/common/             Shared runtime utilities
src/runtime/            Backend registry, selection, and backend adapters
src/tts/                TTS stage modules and orchestration
src/asr/                ASR stage modules, orchestration, and real GGUF inference
src/asr/ref_impl/       Real GGUF inference path for ASR
src/c_api/              Stable C ABI implementation
cli/                    CLI adapters over the library
tests/                  Unit and smoke tests
docs/                   Project documentation

Documentation

  • docs/architecture.md: runtime structure, object lifetimes, and stage boundaries
  • docs/build-and-test.md: build flags, test targets, and CLI usage
  • docs/model-manifests.md: manifest schema, capabilities, and examples
  • docs/real-inference-status.md: validated GGUF workflow and current integration gap

Real Inference Boundary

ASR has real in-tree GGUF inference. TTS is still deterministic fallback stubs — the API shape and stage boundaries are in place, but no GGUF loader or graph execution exists for TTS yet. See docs/real-inference-status.md for the full status, validation history, and reproduction steps.

About

Qwen3 ASR and Qwen3 TTS in GGUF

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors