Realtime speech-to-text with Voxtral Mini Realtime in MLX.
pip install voxmlxTranscribe audio from a file or stream from the microphone in real-time.
Stream from microphone:
voxmlxTranscribe a file:
voxmlx --audio audio.flacOptions:
| Flag | Description | Default |
|---|---|---|
--audio |
Path to audio file (omit to stream from mic) | None |
--model |
Model path or HuggingFace model ID | mlx-community/Voxtral-Mini-4B-Realtime-6bit |
--temp |
Sampling temperature (0 = greedy) |
0.0 |
Convert Voxtral weights to voxmlx/MLX format with optional quantization.
Basic conversion:
voxmlx-convert --mlx-path voxtral-mlx4-bit quantized conversion:
voxmlx-convert -q --mlx-path voxtral-mlx-4bitConvert and upload to HuggingFace:
voxmlx-convert -q --mlx-path voxtral-mlx-4bit --upload-repo username/voxtral-mlx-4bitOptions:
| Flag | Description | Default |
|---|---|---|
--hf-path |
HuggingFace model ID or local path | mistralai/Voxtral-Mini-4B-Realtime-2602 |
--mlx-path |
Output directory | mlx_model |
-q, --quantize |
Quantize the model | Off |
--group-size |
Quantization group size | 64 |
--bits |
Bits per weight | 4 |
--dtype |
Cast weights (float16, bfloat16, float32) |
None |
--upload-repo |
HuggingFace repo to upload converted model | None |
from voxmlx import transcribe
text = transcribe("audio.flac")
print(text)