16 releases

Uses new Rust 2024

0.3.12	Mar 21, 2026
0.3.11	Mar 20, 2026
0.2.1	Mar 10, 2026
0.1.2	Mar 2, 2026
0.1.0	Feb 27, 2026

#2378 in Machine learning

MIT/Apache

11MB
218K SLoC

pmetal-cli

Command-line interface for the PMetal framework.

Overview

This crate provides the pmetal command-line tool for training and inference with LLMs on Apple Silicon.

Installation

cargo install --path crates/pmetal-cli

Or build from source:

cargo build --release -p pmetal-cli
./target/release/pmetal --help

# With ANE and dashboard support (default)
cargo build --release -p pmetal-cli --features "ane dashboard"

Commands

`train`

Fine-tune a model with LoRA or QLoRA:

pmetal train \
  --model Qwen/Qwen3-0.6B-Base \
  --dataset train.jsonl \
  --output ./output \
  --lora-r 16 \
  --batch-size 4 \
  --learning-rate 2e-4

Training Options

Option	Description	Default
`--model`	Model ID or path	Required
`--dataset`	Training data (JSONL)	Required
`--output`	Output directory	`./output`
`--lora-r`	LoRA rank	16
`--lora-alpha`	LoRA alpha	32.0
`--batch-size`	Micro-batch size	1
`--gradient-accumulation-steps`	Grad accumulation	4
`--learning-rate`	Learning rate	2e-4
`--epochs`	Training epochs	1
`--max-seq-len`	Max sequence length	0 (Auto)
`--no-flash-attention`	Disable FlashAttention	false
`--no-sequence-packing`	Disable packing	false
`--no-gradient-checkpointing`	Disable memory savings	false
`--quantization`	QLoRA method (nf4, fp4, int8)	none
`--no-ane`	Disable Apple Neural Engine	false

`infer`

Run inference with optional LoRA adapter:

pmetal infer \
  --model Qwen/Qwen3-0.6B-Base \
  --lora ./output/lora_weights.safetensors \
  --prompt "Does absolute truth exist?" \
  --chat \
  --show-thinking

Inference Options

Option	Description	Default
`--model`	Model ID or path	Required
`--lora`	LoRA adapter path	None
`--prompt`	Input prompt	Required
`--max-tokens`	Max tokens	256
`--temperature`	Sampling temp	Model default
`--top-k`	Top-k sampling	Model default
`--top-p`	Nucleus sampling	Model default
`--min-p`	Min-p dynamic sampling	Model default
`--chat`	Apply chat template	false
`--show-thinking`	Show reasoning content	false
`--fp8`	Use FP8 weights	false
`--no-ane`	Disable ANE inference	false

`dashboard`

Real-time TUI dashboard for monitoring training progress (requires dashboard feature):

pmetal dashboard --metrics-file ./output/metrics.jsonl

`bench`

Benchmark training performance:

pmetal bench \
  --model Qwen/Qwen3-0.6B-Base \
  --batch-size 4 \
  --seq-len 512

Use bench-ffi for overhead analysis and bench-gen for generation loop profiling.

`distill`

Knowledge distillation from teacher to student model:

pmetal distill \
  --teacher Qwen/Qwen3-4B \
  --student unsloth/Qwen3.5-0.8B-Base \
  --dataset train.jsonl \
  --output ./output/distilled \
  --method online \
  --loss-type kl_divergence \
  --temperature 2.0

Supports cross-vocabulary distillation (teacher and student can have different vocab sizes).

Distillation Options

Option	Description	Default
`--teacher`	Teacher model ID	Required
`--student`	Student model ID	Required
`--dataset`	Training data (JSONL)	Required
`--method`	online, offline, progressive	online
`--loss-type`	kl_divergence, jensen_shannon, soft_cross_entropy	kl_divergence
`--temperature`	Softmax temperature	2.0
`--alpha`	Hard/soft label balance	0.5
`--rationale`	Reasoning-aware distillation	false
`--lora-r`	Student LoRA rank	16

`dataset`

Dataset utilities for preparing and analyzing training data:

pmetal dataset analyze --path train.jsonl
pmetal dataset validate --path train.jsonl --model Qwen/Qwen3-0.6B
pmetal dataset prepare TeichAI/dataset-id --output-dir ./data --model Qwen/Qwen3-0.6B

`grpo`

Group Relative Policy Optimization for reasoning models:

pmetal grpo \
  --model unsloth/Qwen3-0.6B-Base \
  --dataset problems.jsonl \
  --output ./output/grpo

Use --dapo for Decoupled Clip and Dynamic Sampling Policy Optimization.

Other Commands

Command	Description
`download`	Download model from HuggingFace
`memory`	Show memory usage and capacity
`quantize`	Quantize model to GGUF (Dynamic 2.0)
`ollama`	Export trained model for Ollama
`init`	Generate sample config file

Environment Variables

Variable	Description
`HF_TOKEN`	HuggingFace API token
`RUST_LOG`	Log level (info, debug, trace)

License

MIT OR Apache-2.0

Dependencies

~113MB
~2M SLoC