Skip to content

dax8it/unsloth-mlx

Β 
Β 

Repository files navigation

Unsloth-MLX Logo

Unsloth-MLX

Fine-tune LLMs on your Mac with Apple Silicon
Prototype locally, scale to cloud. Same code, just change the import.

GitHub stars PyPI Downloads GitHub forks
Platform Python MLX License

Quick Start Β· Training Methods Β· Examples Β· Status


Note

Why I Built This (A Personal Note)

I rely on Unsloth for my daily fine-tuning on cloud GPUsβ€”it's the gold standard for me. But recently, I started working on a MacBook M4 and hit a friction point: I wanted to prototype locally on my Mac, then scale up to the cloud without rewriting my entire training script.

Since Unsloth relies on Triton (which Macs don't have, yet), I couldn't use it locally. I built unsloth-mlx to solve this specific "Context Switch" problem. It wraps Apple's native MLX framework in an Unsloth-compatible API.

The goal isn't to replace Unsloth or claim superior performance. The goal is code portability: allowing you to write FastLanguageModel code once on your Mac, test it, and then push that exact same script to a CUDA cluster. It solves a workflow problem, not just a hardware one.

This is an "unofficial" project built by a fan, for fans who happen to use Macs. It's helping me personally, and if it helps others like me, then I'll have my satisfaction.

Why Unsloth-MLX?

Bringing the Unsloth experience to Mac users via Apple's MLX framework.

  • πŸš€ Fine-tune LLMs locally on your Mac (M1/M2/M3/M4/M5)
  • πŸ’Ύ Leverage unified memory (up to 512GB on Mac Studio)
  • πŸ”„ Same API as Unsloth - your existing code just works!
  • πŸ“¦ Export anywhere - HuggingFace format, GGUF for Ollama/llama.cpp
# Unsloth (CUDA)                        # Unsloth-MLX (Apple Silicon)
from unsloth import FastLanguageModel   from unsloth_mlx import FastLanguageModel
from trl import SFTTrainer              from unsloth_mlx import SFTTrainer

# Rest of your code stays exactly the same!

What This Is (and Isn't)

This is NOT a replacement for Unsloth or an attempt to compete with it. Unsloth is incredible - it's the gold standard for efficient LLM fine-tuning on CUDA.

This IS a bridge for Mac users who want to:

  • πŸ§ͺ Prototype locally - Experiment with fine-tuning before committing to cloud GPU costs
  • πŸ“š Learn & iterate - Develop your training pipeline with fast local feedback loops
  • πŸ”„ Then scale up - Move to cloud NVIDIA GPUs + original Unsloth for production training
Local Mac (Unsloth-MLX)     β†’     Cloud GPU (Unsloth)
   Prototype & experiment          Full-scale training
   Small datasets                  Large datasets
   Quick iterations                Production runs

Project Status

πŸš€ v0.3.5 - Merged model save + load_adapter fixed!

Feature Status Notes
SFT Training βœ… Stable Native MLX training
Model Loading βœ… Stable Any HuggingFace model (quantized & non-quantized)
Save/Export βœ… Stable HF format, GGUF (see limitations)
DPO Training βœ… Stable Full DPO loss
ORPO Training βœ… Stable Full ORPO loss
GRPO Training βœ… Stable Multi-generation + reward
KTO/SimPO βœ… Stable Proper loss implementations
Chat Templates βœ… Stable 15 models (llama, gemma, qwen, phi, mistral)
Response-Only Training βœ… Stable train_on_responses_only()
Multi-turn Merging βœ… NEW to_sharegpt() + conversation_extension
Column Mapping βœ… NEW apply_column_mapping() auto-rename
Dataset Config βœ… NEW HFDatasetConfig structured loading
Vision Models ⚠️ Beta Via mlx-vlm
GUI Interface βœ… New Gradio-based web UI
PyPI Package βœ… Available uv pip install unsloth-mlx

TODO / Roadmap

  • SFT: Show train/val loss + perplexity in the GUI during training (and save a small metrics log)
  • SFT: Add a small built-in prompt evaluation set runner (before/after fine-tune comparison)
  • RL: Add clearer dataset schema validation + sample preview per method (DPO/ORPO/GRPO/KTO/SimPO)
  • RL: Add built-in reward function templates/presets for GRPO (math, formatting, length, etc.)
  • RL: Add better progress reporting (loss components, reward stats, win-rate-like summaries)
  • RL: Improve checkpointing/resume support (continue training from an output folder)
  • Export: Add optional quantization + export presets (quality/speed) and better compatibility notes per model family
  • Packaging: Publish to PyPI and add a simple versioned release workflow

Installation

# Using uv (recommended - faster and more reliable)
uv pip install unsloth-mlx

# Or using pip
pip install unsloth-mlx

# From source (for development)
git clone https://github.com/ARahim3/unsloth-mlx.git
cd unsloth-mlx
uv pip install -e .

Quick Start

🎯 With GUI (Easiest)

Want to fine-tune without writing code? Use our Gradio-based GUI!

# Install GUI dependencies
pip install -e .

# Launch the GUI
python gui.py

# Open http://127.0.0.1:7860 in your browser

The GUI provides tabs for:

  • Loading models from HuggingFace
  • Chatting with models
  • Configuring LoRA adapters
  • SFT and RL training
  • Exporting models

See GUI_README.md for detailed instructions.

Export notes:

  • Browse… buttons are available in the Export tab to pick output locations without typing paths.
  • Save LoRA Adapters exports a small folder containing adapters.safetensors + adapter_config.json.
  • Save Merged Model produces a fused MLX model folder suitable for tools like LM Studio (MLX backend).
  • GGUF export is only supported by mlx_lm for model families: llama, mistral, mixtral. Some model types (e.g. lfm2) cannot be exported to GGUF via mlx_lm.

LEAP GGUF export (Liquid AI):

  • If you're using LEAP-supported model architectures (LFM2 / LFM2-VL / Qwen), you can create an iOS-ready .gguf using the LEAP GGUF Bundling section in the GUI.
  • Install the bundling CLI: pip install leap-bundle.
  • Authenticate once with leap-bundle login <api-key> (credentials are persisted to ~/.liquid-leap). After that you can leave the GUI API key field blank.
  • Recommended flow: Save Merged Model β†’ LEAP: Validate Directory β†’ Create Bundle Request β†’ Download GGUF.
  • On iOS, you can load the downloaded .gguf locally with LEAP SDK via Leap.load(url:).

Dataset notes (SFT):

  • SFT training expects JSONL rows in the {"messages": [...]} chat format.
  • If your dataset is not already in messages format, use the GUI button Convert to messages JSONL in the SFT Training tab.
  • Supported input schemas for conversion:
    • messages: already-correct chat rows
    • conversations: ShareGPT-style (from/value) and variants
    • Alpaca-style: instruction + optional input + output
    • Prompt/completion: prompt + completion (or response)
  • Converted datasets are written to data/converted/ and the GUI auto-fills the dataset path to the converted file.

πŸ’» With Code

from unsloth_mlx import FastLanguageModel, SFTTrainer, SFTConfig
from datasets import load_dataset

# Load any HuggingFace model (1B model for quick start)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="mlx-community/Llama-3.2-1B-Instruct-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha=16,
)

# Load a dataset (or create your own)
dataset = load_dataset("yahma/alpaca-cleaned", split="train[:100]")

# Train with SFTTrainer (same API as TRL!)
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    tokenizer=tokenizer,
    args=SFTConfig(
        output_dir="outputs",
        per_device_train_batch_size=2,
        learning_rate=2e-4,
        max_steps=50,
    ),
)
trainer.train()

# Save (same API as Unsloth!)
model.save_pretrained("lora_model")  # Adapters only
model.save_pretrained_merged("merged", tokenizer)  # Full model
model.save_pretrained_gguf("model", tokenizer)  # GGUF (see note below)

Note

GGUF Export: Works with non-quantized base models. If using a 4-bit model (like above), see Known Limitations for workarounds.

Chat Templates & Response-Only Training

from unsloth_mlx import get_chat_template, train_on_responses_only

# Apply chat template (supports llama-3, gemma, qwen, phi, mistral, etc.)
tokenizer = get_chat_template(tokenizer, chat_template="llama-3")

# Or auto-detect from model name
tokenizer = get_chat_template(tokenizer, chat_template="auto")

# Train only on responses (not prompts) - more efficient!
trainer = train_on_responses_only(
    trainer,
    instruction_part="<|start_header_id|>user<|end_header_id|>\n\n",
    response_part="<|start_header_id|>assistant<|end_header_id|>\n\n",
)

Supported Training Methods

Method Trainer Implementation Use Case
SFT SFTTrainer βœ… Native MLX Instruction fine-tuning
DPO DPOTrainer βœ… Native MLX Preference learning (proper log-prob loss)
ORPO ORPOTrainer βœ… Native MLX Combined SFT + odds ratio preference
GRPO GRPOTrainer βœ… Native MLX Reasoning with multi-generation (DeepSeek R1 style)
KTO KTOTrainer βœ… Native MLX Kahneman-Tversky optimization
SimPO SimPOTrainer βœ… Native MLX Simple preference optimization
VLM VLMSFTTrainer ⚠️ Beta Vision-Language models

Examples

Check examples/ for working code:

  • Basic model loading and inference
  • Complete SFT fine-tuning pipeline
  • RL training methods (DPO, GRPO, ORPO)

Requirements

  • Hardware: Apple Silicon Mac (M1/M2/M3/M4/M5)
  • OS: macOS 13.0+ (15.0+ recommended for large models)
  • Memory: 16GB+ unified RAM (32GB+ for 7B+ models)
  • Python: 3.9+ (3.12 recommended)

Comparison with Unsloth

Feature Unsloth (CUDA) Unsloth-MLX
Platform NVIDIA GPUs Apple Silicon
Backend Triton Kernels MLX Framework
Memory VRAM (limited) Unified (up to 512GB)
API Original 100% Compatible
Best For Production training Local dev, large models

Known Limitations

GGUF Export from Quantized Models

The Issue: GGUF export (save_pretrained_gguf) doesn't work directly with quantized (4-bit) base models. This is a known limitation in mlx-lm, not unsloth-mlx.

What Works:

  • βœ… Training with quantized models (QLoRA) - works perfectly
  • βœ… Saving adapters (save_pretrained) - works
  • βœ… Saving merged model (save_pretrained_merged) - works
  • βœ… Inference with trained model - works
  • ❌ GGUF export from quantized base model - mlx-lm limitation

Workarounds:

  1. Use a non-quantized base model (recommended for GGUF export):

    # Use fp16 model instead of 4-bit
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name="mlx-community/Llama-3.2-1B-Instruct",  # NOT -4bit
        max_seq_length=2048,
        load_in_4bit=False,  # Train in fp16
    )
    # Train normally, then export
    model.save_pretrained_gguf("model", tokenizer)  # Works!
  2. Dequantize during export (results in large fp16 file):

    model.save_pretrained_gguf("model", tokenizer, dequantize=True)
    # Then re-quantize with llama.cpp:
    # ./llama-quantize model.gguf model-q4_k_m.gguf Q4_K_M
  3. Skip GGUF, use MLX format: If you only need the model for MLX/Python inference, just use save_pretrained_merged() - no GGUF needed.

Related Issues:

Contributing

Contributions welcome! Areas that need help:

  • Custom MLX kernels for even faster training
  • More comprehensive test coverage
  • Documentation and examples
  • Testing on different M-series chips (M1, M2, M3, M4, M5)
  • VLM training improvements

License

Apache 2.0 - See LICENSE file.

Acknowledgments

  • Unsloth - The original, incredible CUDA library
  • MLX - Apple's ML framework
  • MLX-LM - LLM utilities for MLX
  • MLX-VLM - Vision model support

Community project, not affiliated with Unsloth AI or Apple.
⭐ Star this repo if you find it useful!

About

Bringing the Unsloth experience to Mac users via Apple's MLX framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.7%
  • Shell 0.3%