14 releases (4 breaking)

Uses new Rust 2024

new 0.5.0	May 8, 2026
0.4.0	Mar 24, 2026
0.3.13	~~Mar 22, 2026~~
0.2.1	Mar 10, 2026
0.1.0	Feb 27, 2026

#488 in Machine learning

MIT/Apache

5MB
100K SLoC

pmetal

Powdered Metal — High-performance LLM fine-tuning framework for Apple Silicon, written in Rust.

This is the umbrella crate that re-exports all PMetal sub-crates behind feature flags. Add a single dependency to access the full framework:

[dependencies]
pmetal = "0.3"                                      # default features
pmetal = { version = "0.3", features = ["full"] }   # everything

Quick Start

Fine-tune a model

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let result = pmetal::easy::finetune("Qwen/Qwen3-0.6B", "data.jsonl")
        .lora(16, 32.0)
        .epochs(3)
        .learning_rate(2e-4)
        .output("./output")
        .run()
        .await?;

    println!("Final loss: {:.4}", result.final_loss);
    Ok(())
}

Run inference

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let result = pmetal::easy::infer("Qwen/Qwen3-0.6B")
        .lora("./output/lora_weights.safetensors")
        .temperature(0.7)
        .max_tokens(256)
        .generate("What is 2+2?")
        .await?;

    println!("{}", result.text);
    Ok(())
}

Query device info

fn main() {
    println!("{}", pmetal::version::device_info());
}

Feature Flags

Feature	Crate	Default	Description
`core`	`pmetal-core`	yes	Foundation types, configs, traits
`gguf`	`pmetal-gguf`	yes	GGUF format with imatrix quantization
`metal`	`pmetal-metal`	yes	Custom Metal GPU kernels + ANE runtime
`hub`	`pmetal-hub`	yes	HuggingFace Hub integration
`mlx`	`pmetal-mlx`	yes	MLX backend (KV cache, RoPE, ops)
`models`	`pmetal-models`	yes	LLM architectures (Llama, Qwen, DeepSeek, ...)
`lora`	`pmetal-lora`	yes	LoRA/QLoRA training
`trainer`	`pmetal-trainer`	yes	Training loops (SFT, DPO, SimPO, ORPO, KTO, GRPO, DAPO, RLKD, Embedding, PPO, GSPO, Online DPO, Diffusion) — enables `data` + `distill`
`easy`	(multiple)	yes	High-level builder API — enables `trainer` + `hub` + `data`
`ane`	`pmetal-metal`	yes	Apple Neural Engine direct programming
`data`	`pmetal-data`	yes*	Dataset loading and preprocessing (*enabled via `easy`/`trainer`)
`distill`	`pmetal-distill`	yes*	Knowledge distillation incl. TAID (*enabled via `trainer`)
`merge`	`pmetal-merge`	no	Model merging (14 strategies: Linear, SLERP, TIES, DARE, DELLA, ModelStock, etc.)
`vocoder`	`pmetal-vocoder`	no	BigVGAN neural vocoder
`distributed`	`pmetal-distributed`	no	Distributed training (mDNS, Ring All-Reduce)
`mhc`	`pmetal-mhc`	no	Manifold-Constrained Hyper-Connections
`lora-metal-fused`	—	no	Fused Metal kernels for ~2x LoRA speedup
`full`	all of the above	no	Everything

Hardware Support

PMetal auto-detects Apple Silicon capabilities and tunes kernel parameters per device:

M1–M5 families (Base, Pro, Max, Ultra)
NAX (Neural Accelerators in GPU) on M5/Apple10
ANE (Apple Neural Engine) with CPU RMSNorm workaround for fp16 stability
UltraFusion multi-die topology detection
Tier-based tuning: FlashAttention block sizes, GEMM tile sizes, threadgroup sizes, batch multipliers

Examples

# Device info
cargo run -p pmetal --example device_info

# Fine-tuning (easy API)
cargo run -p pmetal --example finetune_easy --features easy -- \
    --model Qwen/Qwen3-0.6B --dataset data.jsonl

# Inference (easy API)
cargo run -p pmetal --example inference_easy --features easy -- \
    --model Qwen/Qwen3-0.6B --prompt "What is 2+2?"

# Manual fine-tuning (lower-level control)
cargo run -p pmetal --example finetune_manual --features data,lora,trainer

Re-exports

All sub-crates are available as modules:

use pmetal::core;       // pmetal-core
use pmetal::metal;      // pmetal-metal
use pmetal::mlx;        // pmetal-mlx
use pmetal::models;     // pmetal-models
use pmetal::lora;       // pmetal-lora
use pmetal::trainer;    // pmetal-trainer
use pmetal::hub;        // pmetal-hub
use pmetal::gguf;       // pmetal-gguf
use pmetal::prelude::*; // commonly used types from all crates

License

Licensed under either of MIT or Apache-2.0.

Dependencies

~6–59MB
~1M SLoC