Skip to content

srijitiyer/alloy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alloy

A fast CLI for LLM model merging, format conversion, and diffing. Written in Rust. Reads safetensors natively. Compatible with mergekit configs.

Website · GitHub

Install

cargo install --path .

Requires Rust (any recent stable version).

Usage

# Merge two models
alloy merge config.yaml --output ./merged

# Compare two models tensor-by-tensor
alloy diff ./model_a ./model_b

# Convert dtype (e.g. FP32 to FP16)
alloy convert ./model --output ./model_f16 --dtype f16

# Inspect model metadata
alloy info ./model

Merge methods

Method Description
linear Weighted average of N models
slerp Spherical interpolation between 2 models
nuslerp Multi-model SLERP via sequential pairwise interpolation
task_arithmetic Base + scaled sum of task vectors
ties Trim, elect sign, disjoint merge
dare_linear Random dropout + rescaled linear merge
dare_ties DARE dropout + TIES sign election
della_linear Magnitude-aware dropout + linear merge
della Magnitude-aware dropout + TIES sign election
passthrough Concatenate layer ranges from different models

Config format

alloy reads mergekit-compatible YAML configs:

merge_method: slerp
base_model: mistralai/Mistral-7B-v0.1
models:
  - model: mistralai/Mistral-7B-v0.1
  - model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
  t: 0.5
dtype: float16

HuggingFace model IDs work directly. Models download to ~/.cache/huggingface/hub/. Set HF_TOKEN for gated models (Llama, Gemma, etc.). PyTorch .bin files are auto-converted on first use.

Benchmarks

Real models on Azure Standard_E48s_v5 (48 cores, 384 GB RAM). All times are wall-clock (hyperfine, 3 runs with warmup).

7B - Mistral-7B-v0.1 (BF16, 14.48 GB)

Method alloy mergekit
linear 7.3 s 15.4 s 2.1x faster
slerp 9.5 s 12.9 s 1.4x faster
ties 23.3 s 13.1 s 1.8x slower
dare_ties 38.0 s 15.3 s 2.5x slower

14B - Qwen2.5-14B (BF16, 29.54 GB)

Method alloy mergekit
linear 14.1 s 25.8 s 1.8x faster
slerp 18.5 s 22.1 s 1.2x faster
ties 44.9 s 21.9 s 2.0x slower
dare_ties 75.7 s 27.3 s 2.8x slower

alloy uses fused SIMD kernels (AVX2/NEON) that read BF16 directly from memory-mapped safetensors and compute in f32 registers, avoiding intermediate allocations. IO-bound methods (linear, slerp) are consistently faster. Compute-heavy methods (ties, dare_ties) are still slower due to PyTorch's optimized C tensor kernels.

See alloy.how for the full technical writeup including architecture, algorithm breakdowns, and future work.

License

MIT

About

A fast Rust CLI for LLM model merging, diffing, and conversion. 10 merge algorithms, mergekit-compatible configs, safetensors native.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors