14 releases (4 breaking)
Uses new Rust 2024
| new 0.5.0 | May 8, 2026 |
|---|---|
| 0.4.0 | Mar 24, 2026 |
| 0.3.13 |
|
| 0.2.1 | Mar 10, 2026 |
| 0.1.0 | Feb 27, 2026 |
#488 in Machine learning
5MB
100K
SLoC
pmetal
Powdered Metal — High-performance LLM fine-tuning framework for Apple Silicon, written in Rust.
This is the umbrella crate that re-exports all PMetal sub-crates behind feature flags. Add a single dependency to access the full framework:
[dependencies]
pmetal = "0.3" # default features
pmetal = { version = "0.3", features = ["full"] } # everything
Quick Start
Fine-tune a model
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let result = pmetal::easy::finetune("Qwen/Qwen3-0.6B", "data.jsonl")
.lora(16, 32.0)
.epochs(3)
.learning_rate(2e-4)
.output("./output")
.run()
.await?;
println!("Final loss: {:.4}", result.final_loss);
Ok(())
}
Run inference
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let result = pmetal::easy::infer("Qwen/Qwen3-0.6B")
.lora("./output/lora_weights.safetensors")
.temperature(0.7)
.max_tokens(256)
.generate("What is 2+2?")
.await?;
println!("{}", result.text);
Ok(())
}
Query device info
fn main() {
println!("{}", pmetal::version::device_info());
}
Feature Flags
| Feature | Crate | Default | Description |
|---|---|---|---|
core |
pmetal-core |
yes | Foundation types, configs, traits |
gguf |
pmetal-gguf |
yes | GGUF format with imatrix quantization |
metal |
pmetal-metal |
yes | Custom Metal GPU kernels + ANE runtime |
hub |
pmetal-hub |
yes | HuggingFace Hub integration |
mlx |
pmetal-mlx |
yes | MLX backend (KV cache, RoPE, ops) |
models |
pmetal-models |
yes | LLM architectures (Llama, Qwen, DeepSeek, ...) |
lora |
pmetal-lora |
yes | LoRA/QLoRA training |
trainer |
pmetal-trainer |
yes | Training loops (SFT, DPO, SimPO, ORPO, KTO, GRPO, DAPO, RLKD, Embedding, PPO, GSPO, Online DPO, Diffusion) — enables data + distill |
easy |
(multiple) | yes | High-level builder API — enables trainer + hub + data |
ane |
pmetal-metal |
yes | Apple Neural Engine direct programming |
data |
pmetal-data |
yes* | Dataset loading and preprocessing (*enabled via easy/trainer) |
distill |
pmetal-distill |
yes* | Knowledge distillation incl. TAID (*enabled via trainer) |
merge |
pmetal-merge |
no | Model merging (14 strategies: Linear, SLERP, TIES, DARE, DELLA, ModelStock, etc.) |
vocoder |
pmetal-vocoder |
no | BigVGAN neural vocoder |
distributed |
pmetal-distributed |
no | Distributed training (mDNS, Ring All-Reduce) |
mhc |
pmetal-mhc |
no | Manifold-Constrained Hyper-Connections |
lora-metal-fused |
— | no | Fused Metal kernels for ~2x LoRA speedup |
full |
all of the above | no | Everything |
Hardware Support
PMetal auto-detects Apple Silicon capabilities and tunes kernel parameters per device:
- M1–M5 families (Base, Pro, Max, Ultra)
- NAX (Neural Accelerators in GPU) on M5/Apple10
- ANE (Apple Neural Engine) with CPU RMSNorm workaround for fp16 stability
- UltraFusion multi-die topology detection
- Tier-based tuning: FlashAttention block sizes, GEMM tile sizes, threadgroup sizes, batch multipliers
Examples
# Device info
cargo run -p pmetal --example device_info
# Fine-tuning (easy API)
cargo run -p pmetal --example finetune_easy --features easy -- \
--model Qwen/Qwen3-0.6B --dataset data.jsonl
# Inference (easy API)
cargo run -p pmetal --example inference_easy --features easy -- \
--model Qwen/Qwen3-0.6B --prompt "What is 2+2?"
# Manual fine-tuning (lower-level control)
cargo run -p pmetal --example finetune_manual --features data,lora,trainer
Re-exports
All sub-crates are available as modules:
use pmetal::core; // pmetal-core
use pmetal::metal; // pmetal-metal
use pmetal::mlx; // pmetal-mlx
use pmetal::models; // pmetal-models
use pmetal::lora; // pmetal-lora
use pmetal::trainer; // pmetal-trainer
use pmetal::hub; // pmetal-hub
use pmetal::gguf; // pmetal-gguf
use pmetal::prelude::*; // commonly used types from all crates
License
Licensed under either of MIT or Apache-2.0.
Dependencies
~6–59MB
~1M SLoC