C/C++ inference engine for DeepVQE (Indenbom et al., Interspeech 2023) — real-time acoustic echo cancellation with soft delay estimation, built on GGML.
Looking for something smaller? We've since released LocalVQE, a ~0.9M-parameter (~3.5 MB F32) derivative of DeepVQE with an in-graph DCT-II filterbank and an S4D bottleneck — roughly 9× smaller than the model in this repo, with streaming C++ inference and a Vulkan backend. This repository remains the full-width ~8M-parameter DeepVQE re-implementation.
Requires cmake and a C/C++17 compiler. A Nix flake is provided for reproducible builds:
# Enter dev shell (provides cmake, gcc, pkg-config)
nix develop
# Build the CLI inference binary
make build-ggml
# Or build the shared library (libdeepvqe.so) for embedding
make build-sharedWithout Nix:
cd ggml
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)# Run inference on numpy STFT arrays (mic + far-end reference)
ggml/build/deepvqe model.gguf --input-npy mic.npy ref.npy
# Dump intermediate activations for debugging
ggml/build/deepvqe model.gguf --input-npy mic.npy ref.npy --dump-intermediatesBuild with -DDEEPVQE_BUILD_SHARED=ON to get libdeepvqe.so with a C API
defined in ggml/deepvqe_api.h. This can be loaded via dlopen, Go's
purego, or any FFI.
See ggml/example_purego_test.go for a Go integration example.
Verify C++ blocks against PyTorch reference outputs:
# First, export PyTorch intermediates (requires Docker, see train/)
make compare-pt
make compare-block
# Then run C++ block tests
make test-ggml| Component | Details |
|---|---|
| Sample rate | 16 kHz |
| STFT | 512 FFT, 256 hop, sqrt-Hann window, 257 freq bins |
| Mic encoder | 5 blocks: 2->64->128->128->128->128 channels |
| Far-end encoder | 2 blocks: 2->32->128 channels |
| AlignBlock | Cross-attention soft delay, dmax=32 (320ms) |
| Bottleneck | GRU(1152->576) + Linear(576->1152) |
| Decoder | 5 blocks with sub-pixel conv |
| CCM | 27ch -> 3x3 complex convolving mask |
| Parameters | ~8.0M |
Pre-trained weights are available on Hugging Face: richiejp/deepvqe-aec-gguf.
| Variant | File | Size | Description |
|---|---|---|---|
| F32 | deepvqe.gguf |
31 MB | Full precision (reference) |
| Q8_0 | deepvqe_q8.gguf |
8.5 MB | 8-bit quantized (73% smaller) |
The Q8_0 variant quantizes encoder, decoder (2-5), and bottleneck weights to 8-bit while keeping precision-sensitive layers at F32: AlignBlock (attention), dec1 (mask output), and all biases. End-to-end output divergence from F32 is max 5e-2 / mean 7e-4.
To export your own quantized model:
make -C train export-q8 # or:
./train/scripts/docker-run.sh python export_ggml.py \
--checkpoint <path> --quantize --output deepvqe_q8.ggufCompare quantized vs full-precision outputs:
make test-quantizeSafety note: Training data was filtered by DNSMOS perceived quality scores, which can misclassify distressed speech (e.g. screaming, crying) as noise. This model may attenuate or distort such signals and should not be relied upon for emergency call or safety-critical applications.
To train your own model and export weights, see train/.
All training code lives in train/. It uses Docker with an NVIDIA
NGC PyTorch container. Quick start:
# Build Docker image and run smoke test
make -C train build
make -C train test
# Train on DNS5 data
make -C train train
# Export trained checkpoint to GGUF
make -C train exportSee train/Makefile for all available targets.
Model weights are trained on data from the ICASSP 2023 Deep Noise Suppression Challenge (Microsoft, CC BY 4.0).
This project is licensed under the Apache License 2.0. See LICENSE.
- DeepVQE: Real Time Deep Voice Quality Enhancement (Indenbom et al., Interspeech 2023)
- GGML tensor library
- Xiaobin-Rong implementation (NS-only reference)
- Okrio implementation (AEC reference)