OminiX-MLX

High-performance ML inference on Apple Silicon: LLMs, ASR, TTS, and Image Generation in pure Rust.

Overview

OminiX-MLX is a comprehensive Rust ecosystem for running machine learning models on Apple Silicon using MLX. It provides:

mlx-rs: Safe Rust bindings to Apple's MLX framework
mlx-rs-core: Shared inference infrastructure (KV cache, RoPE, attention)
Model Crates: Dedicated crates for each model family

Built for production use with zero Python dependencies at inference time.

Features

Feature	Description
GPU Acceleration	Metal-optimized inference on M1/M2/M3/M4 chips
Unified Memory	Zero-copy data sharing between CPU and GPU
Lazy Evaluation	Automatic kernel fusion and memory optimization
Pure Rust	No Python runtime required for inference
Modular Design	Use only what you need

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              User Application                                │
│                    (OminiX-API / Custom Rust Application)                   │
└───────────────────────────────────┬─────────────────────────────────────────┘
                                    │
        ┌───────────────────────────┼───────────────────────────┐
        │                           │                           │
        ▼                           ▼                           ▼
┌───────────────┐         ┌─────────────────┐         ┌─────────────────┐
│  LLM / VLM    │         │   Audio Crates  │         │  Image Crates   │
├───────────────┤         ├─────────────────┤         ├─────────────────┤
│ qwen3-mlx     │         │ funasr-mlx      │         │ flux-klein-mlx  │
│ glm4-mlx      │         │ funasr-nano-mlx │         │ zimage-mlx      │
│ glm4-moe-mlx  │         │ funasr-qwen4b   │         │ qwen-image-mlx  │
│ glm-4.7-flash │         │ qwen3-asr-mlx   │         │                 │
│ mixtral-mlx   │         │ qwen3-tts-mlx   │         │                 │
│ mistral-mlx   │         │ gpt-sovits-mlx  │         │                 │
│ qwen3.5-35B   │         │ step-audio2-mlx │         │                 │
│ gemma4-mlx    │         │                 │         │                 │
│ moxin-vlm-mlx │         │                 │         │                 │
│ minicpm-sala   │         │                 │         │                 │
│ deepseek-ocr2 │         │                 │         │                 │
└───────┬───────┘         └────────┬────────┘         └────────┬────────┘
        │                          │                           │
        └──────────────────────────┼───────────────────────────┘
                                   │
                                   ▼
                    ┌──────────────────────────┐
                    │       mlx-rs-core        │
                    ├──────────────────────────┤
                    │ • KV Cache Management    │
                    │ • RoPE Embeddings        │
                    │ • Attention (SDPA)       │
                    │ • Audio Processing       │
                    │ • Metal Kernels          │
                    │ • Speculative Decoding   │
                    └────────────┬─────────────┘
                                 │
                                 ▼
                    ┌──────────────────────────┐
                    │         mlx-rs           │
                    ├──────────────────────────┤
                    │ • Safe Rust API          │
                    │ • Array Operations       │
                    │ • Neural Network Layers  │
                    │ • Transforms (eval, jit) │
                    │ • Random/Ops/Indexing    │
                    └────────────┬─────────────┘
                                 │
                                 ▼
                    ┌──────────────────────────┐
                    │         mlx-sys          │
                    ├──────────────────────────┤
                    │ • FFI Bindings (bindgen) │
                    │ • mlx-c Submodule        │
                    └────────────┬─────────────┘
                                 │
                                 ▼
                    ┌──────────────────────────┐
                    │      Apple MLX (C++)     │
                    ├──────────────────────────┤
                    │ • Metal GPU Backend      │
                    │ • Accelerate Framework   │
                    │ • Unified Memory         │
                    │ • Lazy Evaluation        │
                    └──────────────────────────┘

Data Flow

                         ┌─────────────────────────────────────┐
                         │            Input Data               │
                         │  (Text / Audio / Image)             │
                         └──────────────┬──────────────────────┘
                                        │
         ┌──────────────────────────────┼──────────────────────────────┐
         │                              │                              │
         ▼                              ▼                              ▼
┌─────────────────┐          ┌─────────────────┐          ┌─────────────────┐
│    Tokenizer    │          │  Audio Frontend │          │   VAE Encoder   │
│  (tokenizers)   │          │  (Mel/STFT)     │          │  (img→latent)   │
└────────┬────────┘          └────────┬────────┘          └────────┬────────┘
         │                            │                            │
         ▼                            ▼                            ▼
┌─────────────────┐          ┌─────────────────┐          ┌─────────────────┐
│   Transformer   │          │   Encoder       │          │   Transformer   │
│   (Attention +  │          │   (SAN-M /      │          │   (DiT /        │
│    MLP layers)  │          │    Whisper)     │          │    MMDiT)       │
└────────┬────────┘          └────────┬────────┘          └────────┬────────┘
         │                            │                            │
         ▼                            ▼                            ▼
┌─────────────────┐          ┌─────────────────┐          ┌─────────────────┐
│   LM Head       │          │   CTC / CIF     │          │   VAE Decoder   │
│  (logits→token) │          │   Decoder       │          │  (latent→img)   │
└────────┬────────┘          └────────┬────────┘          └────────┬────────┘
         │                            │                            │
         ▼                            ▼                            ▼
┌─────────────────┐          ┌─────────────────┐          ┌─────────────────┐
│  Detokenizer    │          │  Vocabulary     │          │   PNG/JPEG      │
│  (token→text)   │          │  (idx→text)     │          │   Encoder       │
└────────┬────────┘          └────────┬────────┘          └────────┬────────┘
         │                            │                            │
         ▼                            ▼                            ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              Output Data                                     │
│                       (Generated Text / Transcript / Image)                  │
└─────────────────────────────────────────────────────────────────────────────┘

Crate Structure

OminiX-MLX/
├── mlx-rs/              # Core MLX Rust bindings
├── mlx-rs-core/         # Shared inference infrastructure
│
├── qwen3-mlx/           # Qwen2, Qwen3, Qwen3-MoE
├── glm4-mlx/            # GLM4
├── glm4-moe-mlx/        # GLM4-MoE (45 experts)
├── glm-4.7-flash-mlx/   # GLM-4.7-Flash (latest GLM)
├── mixtral-mlx/         # Mixtral 8x7B/8x22B
├── mistral-mlx/         # Mistral 7B
├── qwen3.5-35B-mlx/     # Qwen3.5-35B LLM
├── gemma4-mlx/          # Gemma 4 12B text inference (4-bit)
├── moxin-vlm-mlx/       # Moxin-7B VLM (vision-language)
├── MiniCPM-SALA-MLX/    # MiniCPM-SALA 9B (hybrid attention, 1M context)
├── deepseek-ocr2-mlx/   # DeepSeek-OCR-2 (document understanding)
│
├── qwen3-tts-mlx/       # Qwen3-TTS (9 speakers, 12 languages, voice cloning)
├── gpt-sovits-mlx/      # GPT-SoVITS voice cloning
├── step-audio2-mlx/     # Step-Audio-2 (speech generation)
├── funasr-mlx/          # FunASR Paraformer ASR
├── funasr-nano-mlx/     # FunASR-Nano (SenseVoice + Qwen)
├── funasr-qwen4b-mlx/   # FunASR-Qwen4B (speech-to-text with Qwen3)
├── qwen3-asr-mlx/       # Qwen3-ASR (30+ languages, 0.6B/1.7B)
│
├── ominix-api/          # Unified OpenAI-compatible API server (ASR + TTS + LLM + OCR)
│
├── flux-klein-mlx/      # FLUX.2-klein image generation
├── zimage-mlx/          # Z-Image generation
└── qwen-image-mlx/      # Qwen image generation

Supported Models

Language Models (LLMs)

Model	Crate	Sizes	Notes
Qwen2	`qwen3-mlx`	0.5B - 72B	Full range supported
Qwen3	`qwen3-mlx`	0.6B - 235B	Including MoE variants
Qwen3.5	`qwen3.5-35B-mlx`	35B	Latest Qwen generation
GLM-4	`glm4-mlx`	9B	Chat and base models
GLM-4-MoE	`glm4-moe-mlx`	9B	45 expert MoE
GLM-4.7-Flash	`glm-4.7-flash-mlx`	—	Latest GLM flash model
Mixtral	`mixtral-mlx`	8x7B, 8x22B	MoE architecture
Mistral	`mistral-mlx`	7B	Sliding window attention
Gemma 4	`gemma4-mlx`	12B	Text-only 4-bit inference
MiniCPM-SALA	`minicpm-sala-mlx`	9B	Hybrid attention (sparse + lightning), 1M context, OpenAI-compatible API

Vision-Language Models (VLMs)

Model	Crate	Sizes	Notes
Moxin-7B VLM	`moxin-vlm-mlx`	7B	DINOv2 + SigLIP + Mistral-7B, 8-bit quantization, 30 tok/s

Speech Recognition (ASR)

Model	Crate	Languages	Performance
Paraformer-large	`funasr-mlx`	Chinese, English	18x real-time
FunASR-Nano	`funasr-nano-mlx`	Chinese, English	SenseVoice + Qwen
FunASR-Qwen4B	`funasr-qwen4b-mlx`	Chinese, English	Qwen3-4B backbone, LoRA fine-tuned
Qwen3-ASR	`qwen3-asr-mlx`	Chinese, English, +28 more	30-50x real-time, 0.6B/1.7B

Text-to-Speech (TTS)

Model	Crate	Features	Performance
Qwen3-TTS	`qwen3-tts-mlx`	9 speakers, 12 languages, voice cloning, emotion control, streaming	2.3x real-time
GPT-SoVITS	`gpt-sovits-mlx`	Few-shot voice cloning	4x real-time
Step-Audio-2	`step-audio2-mlx`	Speech generation	—

OCR / Document Understanding

Model	Crate	Features	Performance
DeepSeek-OCR-2	`deepseek-ocr2-mlx`	Vision-language OCR, PDF support	—

Image Generation (Supported Models)

Model	Crate	Notes
FLUX.2-klein	`flux-klein-mlx`	Qwen3 text encoder
Z-Image	`zimage-mlx`	Fast generation

Quick Start

Prerequisites

macOS 14.0+ (Sonoma)
Apple Silicon (M1/M2/M3/M4)
Rust 1.82+
Xcode Command Line Tools

Build

# Clone repository
git clone https://github.com/OminiX-ai/OminiX-MLX.git
cd OminiX-MLX

# Build all crates
cargo build --release

# Build specific crate
cargo build --release -p qwen3-mlx

LLM Generation

# Download model
huggingface-cli download mlx-community/Qwen3-4B-bf16 --local-dir ./models/Qwen3-4B

# Run text generation
cargo run --release -p qwen3-mlx --example generate_qwen3 -- ./models/Qwen3-4B "Hello, how are you?"

# Run interactive chat
cargo run --release -p qwen3-mlx --example chat_qwen3 -- ./models/Qwen3-4B

use qwen3_mlx::{load_model, Generate, ConcatKeyValueCache};

let mut model = load_model("./models/Qwen3-4B")?;
let mut cache = Vec::new();

let generator = Generate::<ConcatKeyValueCache>::new(
    &mut model, &mut cache, 0.7, &prompt_tokens
);

for token in generator.take(100) {
    let token = token?;
    print!("{}", tokenizer.decode(&[token.item::<u32>()], true)?);
}

Speech Recognition

cd funasr-mlx

# Run transcription
cargo run --release --example transcribe -- \
    --model ./models/paraformer \
    --audio ./audio/test.wav

use funasr_mlx::{load_model, transcribe, Vocabulary};
use funasr_mlx::audio::{load_wav, resample};

// Load audio
let (samples, rate) = load_wav("audio.wav")?;
let samples = resample(&samples, rate, 16000);

// Load model and transcribe
let mut model = load_model("paraformer.safetensors")?;
let vocab = Vocabulary::load("tokens.txt")?;
let text = transcribe(&mut model, &samples, &vocab)?;

Voice Cloning

cd gpt-sovits-mlx/rust

cargo run --release --example voice_clone -- \
    --reference ./audio/reference.wav \
    --text "Hello, this is a voice clone."

use gpt_sovits_mlx::VoiceCloner;

let mut cloner = VoiceCloner::with_defaults()?;
cloner.set_reference_audio("reference.wav")?;

let audio = cloner.synthesize("Hello, world!")?;
cloner.save_wav(&audio, "output.wav")?;

Vision-Language Model

# Download 8-bit quantized Moxin-7B VLM
huggingface-cli download moxin-org/Moxin-7B-VLM-8bit-mlx --local-dir ./models/Moxin-7B-VLM-8bit-mlx

# Run VLM inference with 8-bit quantization
cargo run --release -p moxin-vlm-mlx --example generate -- \
    --model ./models/Moxin-7B-VLM-hf \
    --image ./photo.jpg \
    --prompt "What is in this image?" \
    --quantize 8

use moxin_vlm_mlx::{load_model, load_tokenizer, normalize_dino, normalize_siglip, Generate, KVCache};

let mut vlm = load_model("./models/Moxin-7B-VLM-hf")?;
let vlm = vlm.quantize(64, 8)?; // 8-bit quantization
let tokenizer = load_tokenizer("./models/Moxin-7B-VLM-hf")?;

// Preprocess image to [1, 224, 224, 3] and normalize
let dino_img = normalize_dino(&tensor)?;
let siglip_img = normalize_siglip(&tensor)?;

// Generate
let mut cache: Vec<KVCache> = Vec::new();
let generator = Generate::new(&mut vlm, &mut cache, 0.0, dino_img, siglip_img, input_ids);

for token in generator.take(256) {
    let token = token?;
    print!("{}", tokenizer.decode(&[token.item::<u32>()], true)?);
}

MiniCPM-SALA (Long-Context LLM)

# Download 8-bit quantized model
huggingface-cli download moxin-org/MiniCPM4-SALA-9B-8bit-mlx --local-dir ./models/MiniCPM-SALA-8bit

# Run text generation
cargo run --release -p minicpm-sala-mlx --example generate -- \
    ./models/MiniCPM-SALA-8bit "Explain the theory of relativity."

# Run interactive chat
cargo run --release -p minicpm-sala-mlx --example chat -- \
    ./models/MiniCPM-SALA-8bit --no-think

# Start OpenAI-compatible API server
cargo run --release -p minicpm-sala-mlx --example server -- \
    --model ./models/MiniCPM-SALA-8bit --port 8080 --no-think

API Endpoints:

Method	Endpoint	Description
POST	`/v1/chat/completions`	OpenAI-compatible chat completion
GET	`/v1/models`	List models with metadata (path, size, quantization, loaded status)
POST	`/v1/models/download`	Download a model from HuggingFace
DELETE	`/v1/models/{id}`	Delete a downloaded model
GET	`/health`	Health check

Example API calls:

# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minicpm-sala-9b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 0.7,
    "max_tokens": 256
  }'

# Download a model
curl -X POST http://localhost:8080/v1/models/download \
  -H "Content-Type: application/json" \
  -d '{"repo_id": "moxin-org/MiniCPM4-SALA-9B-8bit-mlx"}'

# List models
curl http://localhost:8080/v1/models

# Delete a model
curl -X DELETE http://localhost:8080/v1/models/MiniCPM4-SALA-9B-8bit-mlx

Qwen3-ASR (Speech Recognition)

# Download model (1.7B 8-bit recommended, 2.46 GB)
huggingface-cli download mlx-community/Qwen3-ASR-1.7B-8bit \
    --local-dir ~/.OminiX/models/qwen3-asr-1.7b

# Transcribe audio
cargo run --release -p qwen3-asr-mlx --example transcribe -- audio.wav

# Specify language
cargo run --release -p qwen3-asr-mlx --example transcribe -- audio.wav --language English

# Use 0.6B model (faster, 1.01 GB)
huggingface-cli download mlx-community/Qwen3-ASR-0.6B-8bit \
    --local-dir ~/.OminiX/models/qwen3-asr-0.6b
cargo run --release -p qwen3-asr-mlx --example transcribe -- \
    ~/.OminiX/models/qwen3-asr-0.6b audio.wav

OminiX-API (Unified API Server)

Single HTTP server exposing ASR, TTS, LLM, and OCR through OpenAI-compatible REST endpoints. See the ominix-api README for full documentation.

# Start with ASR + TTS
cargo run --release -p ominix-api -- \
    --asr-model ~/.OminiX/models/qwen3-asr-1.7b \
    --tts-model ~/.OminiX/models/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    --port 8080

# Transcribe (OpenAI Whisper-compatible)
curl http://localhost:8080/v1/audio/transcriptions \
    -F file=@audio.wav -F language=Chinese

# Text-to-speech (preset speaker)
curl http://localhost:8080/v1/audio/speech \
    -H "Content-Type: application/json" \
    -d '{"input": "Hello, world!", "voice": "vivian", "language": "english"}' \
    --output output.wav

# Text-to-speech (emotion control)
curl http://localhost:8080/v1/audio/speech \
    -H "Content-Type: application/json" \
    -d '{
      "input": "我太开心了！",
      "voice": "serena",
      "language": "chinese",
      "prompt": "用兴奋激动的语气说话，充满热情和活力"
    }' --output excited.wav

# Voice cloning
curl http://localhost:8080/v1/audio/speech/clone \
    -H "Content-Type: application/json" \
    -d '{
      "input": "你好，很高兴认识你。",
      "reference_audio": "<base64-wav>",
      "language": "chinese"
    }' --output cloned.wav

API Endpoints:

Method	Endpoint	Feature	Description
POST	`/v1/audio/transcriptions`	`asr`	Transcribe audio (OpenAI Whisper-compatible)
POST	`/v1/audio/speech`	`tts`	Text-to-speech with preset speakers, emotion control, speed
POST	`/v1/audio/speech/clone`	`tts`	Voice cloning from reference audio
POST	`/v1/chat/completions`	`llm`	Chat completion (streaming SSE)
POST	`/v1/ocr`	`ocr`	Document OCR with vision-language model
GET	`/v1/models`	—	List all models with metadata
POST	`/v1/models/download`	—	Download model from HuggingFace
DELETE	`/v1/models/{id}`	—	Delete a model
GET	`/health`	—	Health check with version info

Qwen3-TTS (Text-to-Speech)

# Download model (8-bit quantized, ~1.8 GB)
huggingface-cli download mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    --local-dir ~/.OminiX/models/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit

# Basic synthesis
cargo run --release -p qwen3-tts-mlx --example synthesize -- \
    -m ~/.OminiX/models/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    "Hello! Welcome to Qwen3 TTS." \
    -s vivian -l english -o output.wav

# Emotion/style control
cargo run --release -p qwen3-tts-mlx --example synthesize -- \
    -m ~/.OminiX/models/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    "我太开心了！" -s serena -l chinese \
    --instruct "用兴奋激动的语气说话，充满热情和活力"

# Voice cloning (download Base model first)
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base \
    --local-dir ~/.OminiX/models/Qwen3-TTS-12Hz-1.7B-Base
cargo run --release -p qwen3-tts-mlx --example synthesize -- \
    -m ~/.OminiX/models/Qwen3-TTS-12Hz-1.7B-Base \
    "你好，很高兴认识你。" -l chinese \
    --reference-audio ./reference.wav

# Streaming mode
cargo run --release -p qwen3-tts-mlx --example synthesize -- \
    -m ~/.OminiX/models/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    "This demonstrates streaming output." --streaming

use qwen3_tts_mlx::{Synthesizer, SynthesizeOptions, save_wav, normalize_audio};

let mut synth = Synthesizer::load("./models/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit")?;

// Preset speaker
let opts = SynthesizeOptions {
    speaker: "vivian",
    language: "english",
    ..Default::default()
};
let samples = synth.synthesize("Hello, world!", &opts)?;
save_wav(&normalize_audio(&samples, 0.95), synth.sample_rate, "output.wav")?;

// Speaker + emotion instruction
let samples = synth.synthesize_with_speaker_instruct(
    "我太开心了！",
    "serena",
    "用兴奋激动的语气说话，充满热情和活力",
    "chinese",
    &opts,
)?;

See the qwen3-tts-mlx README for full API documentation, all synthesis modes, supported speakers/languages, and performance benchmarks.

Image Generation

# Download Z-Image model
huggingface-cli download uqer1244/MLX-z-image --local-dir ./models/zimage-turbo-mlx

# Generate image with Z-Image
cargo run --release -p zimage-mlx --example generate_zimage -- "a cat sitting on a couch"

# Download FLUX.2-klein model
huggingface-cli download black-forest-labs/FLUX.2-klein-4B --local-dir ./models/flux-klein

# Generate image with FLUX.2-klein
cargo run --release -p flux-klein-mlx --example generate_klein -- "a beautiful sunset over mountains"

Performance

Benchmarks on Apple M3 Max (128GB):

Task	Model	Performance	Memory
LLM	Qwen3-4B	45 tok/s	8GB
LLM	GLM4-9B-4bit	35 tok/s	6GB
LLM	Mixtral-8x7B-4bit	25 tok/s	26GB
LLM	MiniCPM-SALA-9B-8bit	28 tok/s	9.6GB
VLM	Moxin-7B-8bit	30 tok/s	10GB
ASR	Paraformer	18x real-time	500MB
ASR	Qwen3-ASR-1.7B-8bit	30x real-time	2.5GB
ASR	Qwen3-ASR-0.6B-8bit	50x real-time	1.0GB
TTS	Qwen3-TTS-1.7B-8bit	2.3x real-time	1.8GB
TTS	GPT-SoVITS	4x real-time	2GB
Image	Z-Image	~3s/image	8GB
Image	FLUX.2-klein	~5s/image	13GB

Documentation

Crate	README	Description
mlx-rs-core	README	Shared infrastructure
qwen3-mlx	README	Qwen model family
glm4-mlx	README	GLM4 models
glm4-moe-mlx	README	GLM4-MoE
mixtral-mlx	README	Mixtral MoE
mistral-mlx	README	Mistral 7B
gemma4-mlx	README	Gemma 4 12B text inference
moxin-vlm-mlx	README	Moxin-7B VLM
minicpm-sala-mlx	README	MiniCPM-SALA 9B (hybrid attention, 1M context)
ominix-api	README	Unified OpenAI-compatible API server (ASR + TTS + LLM + OCR)
qwen3-tts-mlx	README	Qwen3-TTS (9 speakers, 12 languages, voice cloning, emotion control)
qwen3-asr-mlx	README	Qwen3-ASR (30+ languages, 0.6B/1.7B)
funasr-mlx	README	Paraformer ASR
funasr-nano-mlx	README	FunASR-Nano
gpt-sovits-mlx	README	GPT-SoVITS voice cloning

Feature Flags

Flag	Description	Default
`metal`	Enable Metal GPU acceleration	On
`accelerate`	Use Accelerate framework	On

License

Dual-licensed under MIT and Apache 2.0.

Acknowledgments

Apple MLX Team for the MLX framework
oxideai for the original mlx-rs bindings

Name		Name	Last commit message	Last commit date
Latest commit History 530 Commits
.github		.github
deepseek-ocr2-mlx		deepseek-ocr2-mlx
docs		docs
flux-klein-mlx		flux-klein-mlx
funasr-mlx		funasr-mlx
funasr-nano-mlx		funasr-nano-mlx
funasr-qwen4b-mlx		funasr-qwen4b-mlx
gemma4-mlx		gemma4-mlx
glm-4.7-flash-mlx		glm-4.7-flash-mlx
glm4-mlx		glm4-mlx
glm4-moe-mlx		glm4-moe-mlx
gpt-sovits-mlx		gpt-sovits-mlx
minicpm-sala-mlx		minicpm-sala-mlx
mistral-mlx		mistral-mlx
mixtral-mlx		mixtral-mlx
mlx-lm-utils/tests/fixtures/qwen3		mlx-lm-utils/tests/fixtures/qwen3
mlx-rs-core		mlx-rs-core
mlx-rs-lm		mlx-rs-lm
mlx-rs		mlx-rs
moxin-vlm-mlx		moxin-vlm-mlx
qwen-image-mlx		qwen-image-mlx
qwen3-asr-mlx		qwen3-asr-mlx
qwen3-mlx		qwen3-mlx
qwen3-tts-core		qwen3-tts-core
qwen3-tts-mlx		qwen3-tts-mlx
qwen3-vl-mlx		qwen3-vl-mlx
qwen3.5-35B-mlx		qwen3.5-35B-mlx
step-audio2-mlx		step-audio2-mlx
xtask		xtask
zimage-mlx		zimage-mlx
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
OPTIMIZATION_DETAILS.md		OPTIMIZATION_DETAILS.md
README.md		README.md
RELEASING.md		RELEASING.md
cliff.toml		cliff.toml
release.toml		release.toml

Folders and files

Latest commit

History

Repository files navigation

OminiX-MLX

Overview

Features

Architecture

Data Flow

Crate Structure

Supported Models

Language Models (LLMs)

Vision-Language Models (VLMs)

Speech Recognition (ASR)

Text-to-Speech (TTS)

OCR / Document Understanding

Image Generation (Supported Models)

Quick Start

Prerequisites

Build

LLM Generation

Speech Recognition

Voice Cloning

Vision-Language Model

MiniCPM-SALA (Long-Context LLM)

Qwen3-ASR (Speech Recognition)

OminiX-API (Unified API Server)

Qwen3-TTS (Text-to-Speech)

Image Generation

Performance

Documentation

Feature Flags

License

Acknowledgments

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages