#dsp #stft #windowing #voice #istft #mlx #fft #vocoder #short-time #hanning

voice-dsp

DSP primitives: STFT, iSTFT, overlap-add, windowing

3 unstable releases

0.2.0 Mar 18, 2026
0.1.1 Mar 18, 2026
0.1.0 Mar 18, 2026

#1361 in Algorithms


Used in 4 crates (2 directly)

MIT license

21KB
356 lines

voice-dsp

DSP primitives for the voice TTS pipeline, built on mlx-rs (Apple MLX).

Install

[dependencies]
voice-dsp = "0.1"

What's inside

  • STFT / iSTFT — Short-Time Fourier Transform and its inverse, matching PyTorch conventions
  • MlxStft — batched STFT wrapper used by the vocoder pipeline (transform → magnitude + phase, inverse → audio)
  • Windowing — Hanning window generation
  • Interpolation — 1-D nearest/linear interpolation for upsampling tensors
  • Phase utilitiesmlx_angle (complex argument) and mlx_unwrap (phase unwrapping)

Usage

use voice_dsp::{stft, istft, hanning, MlxStft};

// Batched STFT for the vocoder
let stft = MlxStft::new(1024, 256, 1024)?;
let (magnitude, phase) = stft.transform(&audio_batch)?;
let reconstructed = stft.inverse(&magnitude, &phase)?;

All functions operate on mlx_rs::Array and return Result<_, mlx_rs::error::Exception>.

Requirements

  • macOS with Apple Silicon (MLX requirement)

License

MIT

Dependencies

~4–7MB
~139K SLoC