Releases: oumi-ai/oumi
v0.8
Oumi v0.8 Release Notes
This release lands the new oumi deploy CLI for shipping models to dedicated inference endpoints, an oumi-mcp server for MCP-capable assistants, batch-API parity across hosted providers, and a major dependency push to Transformers``v5 / TRL0.30+ /vLLM` 0.20+.
Highlights
oumi deploy — new CLI for dedicated inference endpoints
A first-class deployment CLI that uploads a model and stands up a dedicated endpoint on a managed inference provider. Fireworks.ai lands first, Parasail ships in this release, and the architecture (base_client.py, typed exception hierarchy) is built so additional providers can plug in.
What it does in one go: validates the model directory → uploads weights (full model or LoRA adapter) → creates a dedicated endpoint with the requested hardware → polls until live → optionally fires test prompts.
# Single-command deploy from a YAML config
oumi deploy up --config configs/examples/deploy/fireworks_deploy.yaml
# Or assemble the deploy on the CLI
oumi deploy up \
--model-path /path/to/my-finetuned-model/ \
--provider fireworks \
--hardware nvidia_h100_80gb \
--gpu-count 2 \
--min-replicas 1 \
--max-replicas 4
# Lifecycle commands
oumi deploy upload --model-path ... --provider fireworks --wait
oumi deploy create-endpoint --model-id ... --provider fireworks --wait
oumi deploy status --endpoint-id ep-123 --provider fireworks --watch
oumi deploy list --provider fireworks
oumi deploy list-models --provider fireworks
oumi deploy list-hardware --provider fireworksExample fireworks_deploy.yaml:
model_source: /path/to/my-finetuned-model/
provider: fireworks
model_name: my-finetuned-model-v1
model_type: full # or "adapter" for LoRA + base_model: ...
hardware:
accelerator: nvidia_h100_80gb
count: 2
autoscaling:
min_replicas: 1
max_replicas: 4
test_prompts:
- "Hello, how are you?"oumi-mcp — Model Context Protocol server
Oumi now ships an MCP server so MCP-capable assistants (Claude Desktop, Claude Code, Cursor, …) can browse the ~500 bundled YAML configs, launch and monitor training / eval / inference jobs locally or on cloud, and read built-in workflow guidance — all from chat.
pip install "oumi[mcp]"
# Two equivalent entry points, both stdio:
oumi-mcp
python -m oumi.mcpWire it up in your client (Claude Code shown):
claude mcp add oumi oumi-mcpOr in claude_desktop_config.json / Cursor / ~/.claude.json:
{
"mcpServers": {
"oumi": { "command": "oumi-mcp" }
}
}Built-in prompts cover get-started, train, infer, eval, synth, analyze, post-training, cloud-launch, and an end-to-end mle_workflow. Full guide: docs/user_guides/mcp.md.
Batch API support across hosted inference providers
Hosted batch-inference parity. oumi infer and the engines now expose batch endpoints for Anthropic, Fireworks, and Together, plus job control (cancel, partial retry) and progress tracking.
from oumi.core.configs import InferenceConfig
from oumi.inference import AnthropicInferenceEngine
config = InferenceConfig.from_yaml("infer.yaml")
engine = AnthropicInferenceEngine(model_params=config.model)
# Submit, poll, fetch results — engine handles the batch lifecycle
results = engine.infer_batch(input=conversations, inference_config=config)# Inside an inference YAML, opt in to the batch API
remote_params:
use_batch_api: true
batch_completion_window: "24h"RPM / TPM rate limiting on RemoteInferenceEngine
Sliding-window rate limiting baked into every remote engine so you can pin API budgets without reaching for an external proxy. Tracks RPM, input TPM, and output TPM independently from each provider response. politeness_policy is now deprecated in favor of requests_per_minute.
# infer.yaml
model:
model_name: claude-opus-4-7
engine: ANTHROPIC
remote_params:
num_workers: 16
requests_per_minute: 4000
input_tokens_per_minute: 400_000
output_tokens_per_minute: 80_000Cerebras inference engine
New engine targeting Cerebras-hosted models, registered with the standard InferenceEngineType factory.
model:
model_name: llama-3.3-70b
engine: CEREBRAS
remote_params:
api_key_env_varname: CEREBRAS_API_KEYfrom oumi.inference import CerebrasInferenceEngine
engine = CerebrasInferenceEngine(model_params=...)Related work:
- Cerebras Inference Support (#2231)
Multi-turn conversation synthesis
oumi synth learned to chain conversation synthesizers into multi-turn dialogues — useful for distilling assistant/customer-support style training data.
oumi synth -c oumi://configs/examples/synthesis/multiturn_conversation_synth.yamlThe bundled config generates 5 customer-support conversations with structured action blocks (CLARIFY, LOOKUP_ORDER, INITIATE_RETURN, …); use it as a starting point for your own scenarios. See docs/user_guides/synth.md for the synthesizer composition model.
Related work:
- Feature/multiturn synth (#2172)
- Add token usage accumulation to
AttributeSynthesizer(#2201) - Treat empty arrays in synthesis config as unset (#2246)
Judge framework upgrades
The Judge gained batch inference, token-usage accounting, and the ability to take pre-built Conversation objects (handy when you already constructed multi-turn context elsewhere).
from oumi.judge import judge_dataset
results = judge_dataset(
judge_config="oumi://configs/judges/helpfulness.yaml",
dataset=[
{"question": "What is 2+2?", "answer": "4"},
{"question": "How to cook?", "answer": "I don't know"},
],
output_file="judgments.jsonl",
)
for r in results:
print(r.field_values, r.field_scores)Inference engine quality of life
Smaller but high-leverage improvements that show up everywhere remote inference is used:
list_models()API on every engine — discover what a provider exposes from code (#2333).finish_reasonis now surfaced on conversation outputs across engines (#2249).- Quota errors are a typed, catchable error class instead of opaque HTTP failures (#2222).
- Transient HTTP 400s are retried with backoff (#2357).
api_inputis attached toAPIStatusErrorfor easier debugging (#2355).- Empty Anthropic responses raise an explicit error (#2346); reasoning-model
content: nullis handled (#2354). - Anthropic prompt caching enabled; cache token usage reported for Anthropic and Together (#2324, #2245).
- vLLM engine accepts arbitrary kwargs through
vllm_config_overridesfor late-breaking server flags (#2314). - Step is now included in metrics-logger callback (#2244).
- Temperature constraints handled for all OpenAI reasoning models (#2212).
- Lambda inference engine deprecated (#2332).
engine.list_models(chat_only=True) # ['accounts/.../models/llama-v3p1-70b-instruct', ...]
result = engine.infer(...)
result[0].messages[-1].finish_reason # 'stop' | 'length' | 'tool_calls' | ...Config + CLI error handling
Configuration errors used to surface as raw OmegaConf stack traces. They now flow through a typed exception hierarchy with explicit messages:
OumiConfigParsingErrorcovers allOmegaConfexceptions (#2323).- Config-specific exception hierarchy + CLI error handling (#2319).
DatasetParams.finalize_and_validatechecksdataset_pathexists (#2320).
from oumi.core.configs import TrainingConfig
from oumi.exceptions import OumiConfigParsingError
try:
config = TrainingConfig.from_yaml("train.yaml")
except OumiConfigParsingError as e:
# Clean, user-facing error — not an OmegaConf traceback
print(e)oumi launcher cost & schedule fields
The launcher base cluster now exposes start_at, end_at, and cost_per_hour — useful for time-boxed runs and pricing-aware cluster selection on SkyPilot clouds.
from oumi import launcher
cluster, job = launcher.up(job_config, cluster_name="train-run")
print(cluster.start_at, cluster.end_at, cluster.cost_per_hour)New & updated model configs
- Qwen3.5 0.8B — full-finetune, LoRA, and inference (HF + vLLM) recipes (#2285, #2305).
- Qwen3-VL 2B / 4B / 8B / 30B-A3B vision-language configs and MoE support (#2286, #2288, #2102).
- GPT-OSS 120B LoRA multi-GPU training config (#2186).
- Qwen3 MoE (235B, 30B-A3B, 80B-A3B-Instruct) LoRA training configs (#2211).
- Llama 4 Scout LoRA config refresh (#2184).
oumi train -c configs/recipes/qwen3_5/sft/0.8b_lora/train.yaml
oumi infer -c configs/recipes/qwen3_5/inference/0.8b_vllm_infer.yaml
oumi train -c configs/recipes/gpt_oss/sft/120b_lora_multi_gpu_train.yaml
oumi train -c configs/recipes/qwen3/sft/235b_lora/train.yamlTool-use training: correct tool-result masking
DataCollatorForCompletionOnlyLM was extended to correctly mask tool-result tokens during completion-only training, so SFT on tool-use traces actually trains the assistant turn and not the (already-known) tool output (#2369).
Dependency upgrades
The big one: Transformers v5. Several small breakages were absorbed in this release so users don't have to.
- Transformers →
>=4.57,<5.7(#2317, #2344, #2394) - TRL →
>=0.24,<1.4(#2298, #2362, #2403) - vLLM →
>=0.14,<0.21, including 0.12 support (#2275, #2295, #2352, #2402) - veRL →
>=0.5,<0.8(#2297, #2316, #2322) - PEFT →
>=0.17,<0.20(#2379) - Pydantic →
>=2.11,<2.14(#2381) - SkyPilot →
>=0.11.1,<0.13(#1675e8e6, #2175, #2191, #2349) - datasets, torchvision, torch, wandb, liger-kernel, kernels, typer, **uvicor...
v0.7
Oumi 0.7 Release
✨ Highlights
This release brings major platform upgrades (Python 3.14, PyTorch 2.9), new inference engines, rule-based evaluation judges, and significant CLI/documentation improvements.
🚀 New Features
Inference
- Fireworks inference engine - New backend for Fireworks AI (#2158)
# Fireworks example (set FIREWORKS_API_KEY env var)
model:
model_name: "accounts/fireworks/models/llama4-maverick-instruct-basic"
engine: FIREWORKSoumi infer -i -c configs/recipes/llama4/inference/maverick_instruct_fireworks_infer.yaml- OpenRouter inference engine - New backend for OpenRouter (#2168)
# OpenRouter example (set OPENROUTER_API_KEY env var)
model:
model_name: "anthropic/claude-sonnet-4.5"
engine: OPENROUTER# Use via cli
oumi infer -i -c configs/apis/openrouter/infer_claude_4_5_sonnet.yaml- Loading spinner - Visual feedback during inference operations (#2085)
- Pre-trained custom model support - Load your own pre-trained models (#2044)
Evaluation
- Rule-based judges - Deterministic evaluation judges with CLI integration and examples (#2119, #2171)
# configs/projects/judges/rule_based/regex_match_phone.yaml
judge_params:
prompt_template: "{response}"
rule_judge_params:
rule_type: "regex"
input_fields:
- "response"
rule_config:
pattern: "\\d{3}-\\d{4}"
input_field: "response"
match_mode: "search"
inverse: falseoumi judge dataset -c regex-match-phone --input data/judge_input.jsonlTraining
- Metrics logging callback - Log training metrics to disk (#2140)
- Per-reward function configuration - New
reward_function_kwargssupport (#2143)
trainer_type: TRL_GRPO
reward_functions:
- rubric_reward
- gsm8k
reward_function_kwargs:
rubric_reward:
judge_panel_path: "configs/projects/judges/rubric_panel.yaml"
gsm8k:
strict: trueData & Synthesis
- XLSX and DOCX support - New formats for synthesis and datasets (#2148)
- Few-shot sampling - Sample few-shot examples from sources during synthesis (#2151)
- Batch AttributeSynthesizer - Batch processing support (#2181)
- RaR datasets - New datasets and base rubric dataset classes (#2144)
Infrastructure
- Nebius cloud provider - New cloud option (#2179)
- Kubernetes Skypilot support - Added k8s dependency (#2124)
- ARM Docker support - Enabled ARM builds with useful utilities (#2141)
- One-line installer - New
install.shscript (#2155)
# Basic installation
curl -LsSf https://oumi.ai/install.sh | bash
# With GPU support
curl -LsSf https://oumi.ai/install.sh | bash -s -- --gpu
# With specific Python version
curl -LsSf https://oumi.ai/install.sh | bash -s -- --python 3.12- Telemetry - Optional usage analytics via PostHog (#2145)
📈 Improvements
Performance
- Lazy CLI imports - Faster startup times (#2110)
CLI
- List aliases, auto-complete, help, and common args improvements (#2122)
- Judge command UX improvements (#2129)
- Version and system info utilities (#2142)
oumi env # Show Oumi version, Python version, installed packages, GPU infoDocumentation
- Complete docs refresh with new custom theme (#2133, #2167)
- Added CLI reference sections for analyze, tune, and quantize (#2126)
- Updated installation instructions (#2169)
Configs
- Added Gemma-2-IT chat template and example config (#2159)
- Updated Gemma3-4B-IT SFT training config (#2156)
⚠️ Breaking Changes
- Dropped Python 3.9 support - Minimum supported version is now Python 3.10 (#2107)
- Deprecated alpaca_eval integration (#2108)
- Deprecated protobuf conversation definitions (#2127)
🐛 Bug Fixes
- Fixed synthesis rounding errors (#2104)
- Fixed logging of distributed training CLI commands (#2165)
- Fixed FSDP transformer_wrap_class parsing for fully qualified names (#2164)
- Fixed deprecated torch_dtype usage (#2123)
- Fixed Oumi Tour notebook output (#2157)
- Cleaned up errant print statements (#2121)
📦 Dependency Updates
- PyTorch 2.9 and Python 3.14 support (#2109)
- Updated: peft, uvicorn, bitsandbytes, click, pillow, typer, torchao, pycares, wandb
👋 New Contributors
Welcome to our new contributors!
- @lrobledo (#2105)
- @RajdeepKushwaha5 (#2085)
- @ritankarsaha (#2044)
- @brian-nguyen (#2157)
- @lefft (#2156)
Full Changelog: v0.6.0...v0.7
v0.6.0
Oumi v0.6.0 Changelog
We’re excited to announce Oumi v0.6.0! This release brings Python 3.13 support, a powerful new CLI for dataset analysis, the TRL GOLD trainer for preference learning, and first-class Kubernetes deployment support.
Highlights
Python 3.13 Support
Oumi now officially supports Python 3.13, letting you take advantage of the latest Python performance improvements and features.
(#2092)
New oumi analyze CLI Command
Understanding your training data just got easier. The new oumi analyze command lets you inspect and analyze datasets directly from the command line—no code required.
# Analyze a local dataset
oumi analyze -c configs/examples/analyze/analyze.yaml# Export results in different formats
oumi analyze -c configs/examples/analyze/analyze.yaml --format parquet --output ./my_resultsCreate a simple config to analyze any HuggingFace dataset:
# hf_analyze.yaml
dataset_name: argilla/databricks-dolly-15k-curated-en
split: train
sample_count: 1000
analyzers:
- id: lengthCheck out the analyze documentation for more details.
(#2069, #2071)
TRL GOLD Trainer
We’ve added support for the GOLD (Generalized Online Learning from Demonstrations trainer from TRL. GOLD is an online preference learning algorithm that improves upon DPO by generating responses on-the-fly during training, leading to better alignment with less distribution shift.
# Run GOLD training with the example config
oumi train -c configs/examples/gold/train.yamlOr configure it in your own training config:
training:
trainer_type: "TRL_GOLD"
gold:
teacher_model_name_or_path: "HuggingFaceTB/SmolLM2-360M-Instruct"
temperature: 0.9
max_completion_length: 512
lmbda: 0.5 # 50% on-policy, 50% off-policyThis requires TRL 0.26+, which is now the default.
(#2095, #2097)
Code Evaluation Judges
New LLM-as-judge evaluators specifically designed for assessing code quality. These judges can evaluate generated code for correctness, style, security, and other software engineering best practices—perfect for evaluating coding assistants and code generation models.
Thanks to @N-45div for this contribution!
(#2087)
Kubernetes Deployment
You can now deploy Oumi training jobs on Kubernetes clusters.
Option 1: Using SkyPilot (new in this release)
# k8s_job.yaml
name: my-training-job
resources:
cloud: k8s
accelerators: "A100:1"
run: |
oumi train -c configs/recipes/llama3_1/sft/8b_lora/train.yamloumi launch up -c k8s_job.yaml --cluster my-k8s-clusterOption 2: Direct kubectl deployment
For existing K8s clusters, you can deploy Oumi directly using kubectl. See the Kubernetes deployment guide for detailed instructions including platform-specific examples for EKS, GKE, and AKS.
Thanks to @min-oumi!
(#2054, #2068)
Custom Master Port for Distributed Training
Running multiple distributed training jobs on the same node? You can now specify a custom master port to avoid conflicts.
ARM Docker Images for Mac
Apple Silicon users rejoice! We now publish ARM64 Docker images, so you can run Oumi containers natively on M1/M2/M3 Macs without emulation overhead.
(#2049)
Bug Fixes
- Fix Docker release action (#2023)
- Fix length analyzer column naming and add comprehensive message summary tests (#2057)
- Fix "too many files open" error when processing large datasets (#2060)
- Fix lm_eval multi-GPU integration for distributed evaluation (#2064)
- Fix mutable default argument in conversation handling (#2048)
Documentation
- Add news item on OpenEnv notebook (#2022)
- Add docs for missing inference params and how to serve LoRA adapters (#2047)
- Add local Docker guide (#2058)
Deprecations
- Cambrian model: The experimental Cambrian model has been deprecated (#2034)
- target_col: Removed deprecated target_col field mentions (#2056)
Dependencies
- TRL upgraded to 0.26 (#2097)
- datasets library upgraded (#2091)
- wandb >=0.21,<0.24 (#2032)
- safetensors >=0.6,<0.8 (#2031)
- bitsandbytes >=0.47,<0.49 (#2038)
- torchao >=0.12,<0.15 (#2079)
- deepspeed >=0.17.0,<0.19.0 (#2080)
- pydantic >=2.11,<2.13 (#2081)
- skypilot >=0.10.2,<0.12 (#2089)
- torchdata is now optional (#2066)
New Contributors
- @monnetb made their first contribution in #2021
- @dependabot[bot] made their first contribution in #2029
- @min-oumi made their first contribution in #2054
- @N-45div made their first contribution in #2087
Full Changelog:
v0.5.0...v0.6.0
v0.5.0
Oumi v0.5.0 Release Notes
We're excited to announce Oumi v0.5.0, featuring hyperparameter tuning capabilities, expanded inference options, and enhanced launcher functionality.
🚀 Major Features
Data Synthesis Module
- Introducing
oumi synth- a powerful data synthesis module for automatically generating high-quality training datasets using LLMs (#1965) - Template-based Generation: Control attributes like difficulty, style, and domain for diverse dataset creation
- Domain-specific Datasets: Generate data for specialized fields (legal, medical, technical, etc.)
- Data Augmentation: Expand existing small datasets by generating variations
- Multiple Formats: Support for instruction-following, QA, and conversational datasets
Hyperparameter Tuning Module
- Introducing
oumi tune- a new hyperparameter search and optimization module for efficient model tuning (#1998, #1991). Thank you @gbladislau-aumo!
Inference & Training Enhancements
- Bedrock Integration: Added AWS Bedrock Inference Engine support for scalable model deployment (#1983) - Thank you @aniruddh-alt!
- GKD Trainer Support: New Generalized Knowledge Distillation trainer for model compression workflows (#2000)
- OpenEnv RL Training: Demo notebook showcasing reinforcement learning training with reward visualization (#1996, #2012)
HPC & Launcher Improvements
- NERSC Perlmutter Support: Oumi launcher now supports the NERSC Perlmutter HPC cluster (#1959)
- Enhanced Logging: Added job log trailing and dedicated logs command for better debugging (#1951, #1964)
- Lazy Cloud Initialization: Improved launcher startup performance (#1985)
✨ Improvements
Model Configuration
- Added Qwen3 VL 4B model configurations (#1992, #1993)
- Exposed
chat_template_kwargsparameter in ModelParams for fine-grained control (#1997)
Developer Experience
- Updated BaseConfig to support non-primitive field types (#1684)
- Optional stdout_file parameter in SLURM client (#1974)
🐛 Bug Fixes
- Fixed NaN values in dataset analyzer for single-conversation datasets (#1961)
- Resolved SLURM environment variable issues (PMI_RANK → SLURM_PROCID) (#2010) (Thank you @AliliRayane !)
- Fixed non-primitive field saving in base config (#2005)
- Updated uv pip install commands to include --system flag (#1979)
- Unique inference scratch filenames via hashing (#1986)
📦 Dependency Updates
- Upgraded transformers: 4.56 → 4.57 (#1966, #1990)
- Upgraded TRL: 0.24.0 → 0.25 (#1995, #2011)
- Pinned uvicorn version for SkyPilot compatibility (#1978)
🎉 New Contributors
Welcome to our new contributors!
📖 Full Changelog
For a complete list of changes, see the full changelog
v0.4.2
Release Notes - v0.4.2
🚀 New Features
- Model Support: Added support Qwen3-VL ([#1992](#1992))
- HPC Cluster Support: Added support for NERSC Perlmutter HPC cluster in Oumi launcher ([#1959](#1959))
- Enhanced Logging:
🐛 Bug Fixes
- Fixed Sky Pilot unit tests ([#1967](#1967))
- Fixed GPU test issues ([#1970](#1970))
- Pinned uvicorn version to resolve SkyPilot compatibility ([#1978](#1978))
- Updated inference to always hash for unique scratch filenames ([#1986](#1986))
- Improved error handling for document processing issues ([#1989](#1989))
🔧 Improvements
- Performance: Lazy initialization of clouds in Oumi launcher for faster startup ([#1985](#1985))
- Code Quality:
📚 Documentation
- Updated README with latest information ([#1968](#1968))
- Added synthesis documentation and example configurations ([#1965](#1965))
Full Changelog: v0.4.0...v0.4.2
v0.4.1
What's Changed
- Fix NaN values in dataset analyzer statistics for single conversations by @ryan-arman in #1961
- [tiny] Add
__init__.pyto some test dirs by @wizeng23 in #1963 - Add The Ability To Trail Logs For Launcher Jobs by @rlehman221 in #1951
- Move compute statistics to analysis_utils by @ryan-arman in #1962
- Fixed Sky Pilot Unit Tests Failing by @rlehman221 in #1967
- extract conversation_turns from conversation_level_summary to top level by @ryan-arman in #1969
- Update README.md by @wizeng23 in #1968
- [tiny] Upgrade transformers to 4.56 by @wizeng23 in #1966
- Quick fix for our GPU tests by @taenin in #1970
- Support NERSC Perlmutter HPC cluster in Oumi launcher by @wizeng23 in #1959
- Added Launcher Logs Command by @rlehman221 in #1964
- Make stdout_file optional in slurm client by @rlehman221 in #1974
- Add synth documentation and example configs by @jgreer013 in #1965
- Add cuda launch blocking arg to e2e tests for debugging by @jgreer013 in #1976
- Pin uvicorn version to fix skypilot by @jgreer013 in #1978
- Update inference to always hash for unique inference scratch filenames by @jgreer013 in #1986
- Lazy init clouds in oumi launcher by @oelachqar in #1985
- Unblock oumi dataset support in synthesis by @jgreer013 in #1988
Full Changelog: v0.4.0...v0.4.1
v0.4.0
Oumi v0.4 Changelog
✨ gpt-oss Training and Inference
OpenAI released two highly-anticipated open-weight models in August, gpt-oss-20b and gpt-oss-120b. They’re mixture-of-experts (MoE) reasoning models with strong tool-use performance, and are optimized with native 4-bit quantization for memory-efficient training and inference. You can now run training and inference on these models in Oumi!
Usage Example:
# Train gpt-oss-20b with LoRA on a single GPU
oumi train -c oumi://configs/recipes/gpt_oss/sft/20b_lora_single_gpu_train.yaml
# Run local inference on gpt-oss-120b using vLLM
oumi infer -i -c oumi://configs/recipes/gpt_oss/inference/120b_vllm_infer.yaml
⚡ DeepSpeed Support
DeepSpeed is a powerful and configurable optimization library that allows you to train large models efficiently using techniques like distributed training and memory optimization. Oumi now supports DeepSpeed in addition to PyTorch’s native Fully Sharded Data Parallel (FSDP) for distributed training!
Usage Example:
# Train Llama 3.1 8B using DeepSpeed’s ZeRO-3 optimization strategy
oumi train -c oumi://configs/examples/deepspeed/llama3_1_8b_deepspeed_z3_train.yaml
# Combine DeepSpeed with YARN RoPE scaling to enable training on longer contexts!
# Train Qwen2.5 7B with 128k token context length using YARN and DeepSpeed
oumi train -c oumi://configs/projects/limo/qwen2.5_7b_fft_yarn_deepspeed.yaml
🗄️ CLI Tool for Hugging Face Cache Management
When using datasets and models from Hugging Face Hub, over time it becomes hard to track what’s been cached, how much space it’s taking up, etc. In #1897, @aniruddh-alt has added a oumi cache utility to the Oumi CLI. This lets you view, add to, and delete from the Hugging Face Hub local cache, in addition to getting more information about entries in the cache.
Usage Example:
# View what’s in the cache
oumi cache ls
# Filter for items containing the substring “llama”, and sort by name
oumi cache ls -f *llama* --sort name
# Download a model to cache
oumi cache get Qwen/Qwen3-0.6B
# View information about the cached model
oumi cache card Qwen/Qwen3-0.6B
# Remove a model from cache
oumi cache rm Qwen/Qwen3-0.6B
🎯 Vision DPO and KTO Support
We have added support for two new training methods: Direct Preference Optimization (DPO) on Vision-Language Models and Kahneman-Tversky Optimization (KTO). Special thanks to @efsiatras for implementing KTO support in #1538!
Usage Example:
# Vision DPO on Qwen2.5-VL 3B
oumi train -c oumi://configs/recipes/vision/qwen2_5_vl_3b/dpo/train.yaml
# KTO on Phi-3
oumi train -c oumi://configs/recipes/phi3/kto/train.yaml
🛠️ Developer Experience
- Upgrade several package dependencies to latest versions
- Additional GGUF, MacOS LlamaCPP, and remote frontier model inference configs by @penfever in #1923 and #1947
- Add Pre-Populated GitHub Issue Link On Failures by @rlehman221 in #1936
- Add Verbose Flag to Oumi Train by @rlehman221 in #1940
- Enable users to log data samples during training for debugging by @shanghongsim in #1943
New Contributors
- @efsiatras made their first contribution in #1538
- @rlehman221 made their first contribution in #1936
All Contributors
@aniruddh-alt, @efsiatras, @jgreer013, @kaisopos, @oelachqar, @penfever, @rlehman221, @ryan-arman, @shanghongsim, @stefanwebb, @taenin, @wizeng23
Full Changelog: v0.3.0...v0.4.0
v0.3.0
Oumi v0.3 Changelog
🔧 Model Quantization (NEW)
Quantization is a crucially important family of methods for reducing model size, for example, prior to deployment. Oumi now supports applying Activation-aware Weight Quantization (AWQ) to all models. See how in our notebook.
Usage Example:
# Quick start - quantize TinyLlama to 4-bit
oumi quantize --method awq_q4_0 --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --output quantized_model
# With configuration file
oumi quantize --config quantization_config.yaml
⚖️ Judge API V2 (MAJOR UPDATE)
LLM-as-a-Judge is a method for using foundation models to reliably evaluate other foundation models. We’ve overhauled Oumi’s LLM-as-Judge interface for ease-of-use and flexibility. Check out our notebook here.
Usage Example:
from oumi.judges.simple_judge import SimpleJudge
# Built-in truthfulness judge
simple_judge = SimpleJudge(judge_config="oumi://configs/projects/judges/generic/truthfulness.yaml")
dataset = [{"request": "What is the capital of France?", "response": "Rome"}]
outputs = simple_judge.judge(dataset)
🎯 Adaptive Inference (NEW)
💪 Adaptive Inference, as we term it, refers to new features in Oumi for resuming training (or any task) when a job has crashed, as well as optimizing inference parallelization to maximize bandwidth. Learn more in our notebook.
🛠️ Developer Experience
- Updated contributing guidelines
- Enhanced documentation
- Tutorial notebook fixes
- Improved error handling and testing
- MLflow integration improvements
- Multi-node verl Slurm job support
- Rich logging handler option
New Contributors
Full Changelog: v0.2.1...v0.3.0
v0.2.1
What's Changed
- Set infer_online and infer_from_file to private by @jgreer013 in #1745
- Update launch.md by @shanghongsim in #1781
- Add adaptive semaphore to enable future adaptive throughput scenarios by @jgreer013 in #1780
- Fix a pyright regression by @taenin in #1783
- Judge API V2 | Fix judge config from repo path by @kaisopos in #1782
- Add permutable attributes and combination sampling for data synthesis by @jgreer013 in #1773
- Removed collator in finetuning tutorial notebook by @shanghongsim in #1788
- Update our contributing guidelines. by @taenin in #1789
- Add adaptive concurrency controller in preparation for adaptive inference by @jgreer013 in #1784
- Fixed issue with final conversations not consistently being saved by @jgreer013 in #1795
- Add support for ingesting datasets for synthesis by @jgreer013 in #1790
- Add support for adaptive inference by @jgreer013 in #1791
- Add support for Example Sources in Synthesis by @jgreer013 in #1797
- Webinar announcement and other news by @stefanwebb in #1800
- Added utm_source parameters by @stefanwebb in #1802
- Add code to handle document ingestion by @jgreer013 in #1796
- Add code for handling basic document segmentation by @jgreer013 in #1803
- Update mflow support in oumi trainer by @oelachqar in #1804
- Add multi-node verl SLURM job by @wizeng23 in #1798
- Fixed various tutorial notebooks by @shanghongsim in #1792
- Add parameter logging to oumi trainer by @oelachqar in #1807
- Judge API V2 | Enable prompt variable replacement by YAML by @kaisopos in #1805
- [tiny] Update train config comment header by @wizeng23 in #1809
- Add experimental option to use the rich logging handler by @oelachqar in #1810
New Contributors
- @shanghongsim made their first contribution in #1781
Full Changelog: v0.2.0...v0.2.1
v0.2.0
Highlights
GRPO support for trl and verl trainers
Oumi now supports GRPO training for both the trl and verl libraries! This allows you to run GRPO training with no/low code using Oumi's configs. You can also benefit from other features of the Oumi platform, such as custom evaluation and launching remote jobs.
Running GRPO training in Oumi is as simple as:
- Create a reward function, and register it to Oumi's reward function registry using
@register("<my_reward_fn>", RegistryType.REWARD_FUNCTION). - Create a dataset class to process your HF dataset into the format needed for your target framework, and register it to Oumi's dataset registry using
@register_dataset("@hf-org-name/my-dataset-name"). - Create an Oumi training config with your model, dataset, reward function, and hyperparameters. For specific details on setting up the config for GRPO, see our documentation.
- Launch the training job locally using the oumi train CLI, or launch a remote job using the oumi launch CLI.
For an end-to-end example using Oumi + trl, check out our notebook walkthrough. For verl, check out our multi-modal Geometry3K config. Finally, check out our blog post for more information.
Models built with Oumi: HallOumi and CoALM
We’re proud to announce the release of two models built with Oumi: HallOumi and CoALM! Both of these were trained on Oumi, and we provide recipes to reproduce their training from scratch.
- 🧀 HallOumi: A truly open-source claim verification (hallucination detection) model developed by Oumi, outperforming Claude Sonnet, OpenAI o1, DeepSeek R1, Llama 405B, and Gemini Pro at only 8B parameters. Check out the Oumi recipe to train the model here.
- 🤖 CoALM: Conversational Agentic Language Model (CoALM) is a a unified approach that integrates both conversational and agentic capabilities. It includes an instruction tuning dataset and three trained models (8B, 70B, 405B). The project was a partnership between the ConvAI Lab at UIUC and Oumi, and the paper was accepted to ACL. Check out the Oumi recipes to train the models here.
New model support: Llama 4, Qwen3, Falcon H1, and more
We’ve added support for many recent models to Oumi, with tested recipes that work out-of-the-box!
- Vision Language Models
- Text-to-text LLMs
Support for Slurm and Frontier clusters
At Oumi, we want unify and simplify the processes for running jobs on remote clusters. We have now added support for launching jobs on Slurm clusters, and on Frontier, a supercomputer at the Oak Ridge Leadership Computing Facility.
What's Changed
- [bugfix] Allow prerelease when building docker image by @oelachqar in #1753
- Update link to Oumi banner image in README by @wizeng23 in #1752
- docs: add a badge and link to the social network Twitter by @Radovenchyk in #1751
- Support OLCF (Oak Ridge Leadership Computing Facility) Frontier HPC cluster in Oumi launcher by @nikg4 in #1721
- Judge API V2 | Core Functionality by @kaisopos in #1717
- Update
oumi distributed torchrunto fallback tooumi train -c cfg.yaml ....on a single-node with 1 GPU by @nikg4 in #1755 - deps: Upgrade verl to 0.4.0 by @wizeng23 in #1749
- add DCVLR logo to readme by @penfever in #1754
- Judge API V2 | Few-Shots by @kaisopos in #1746
- Update infer.md to fix a broken link by @ryan-arman in #1756
- Judge API V2 | minor nit by @kaisopos in #1757
- [Evaluation] Disabling flaky MMMU test by @kaisopos in #1758
- Automatically tail SkyPilot logs by @wizeng23 in #1761
- Enable vLLM for trl GRPO jobs by @wizeng23 in #1760
- Judge API V2 | Implement CLI by @kaisopos in #1759
- Updates to Oumi news for May, June by @stefanwebb in #1763
- Additional news items by @stefanwebb in #1764
- Judge API V2 | Support for built-in judges by @kaisopos in #1762
- [bug] safetensors v0.6.0rc0 is causing a regression, prevent upgrading by @oelachqar in #1772
- [verl] Support resuming from checkpoint by @wizeng23 in #1766
- Upgrade accelerate and peft by @wizeng23 in #1774
- [tiny] Pin flash-attn version by @wizeng23 in #1775
- Pin the version of lm_eval to prevent a breaking change in the 4.9 release by @taenin in #1777
- Update inference to resume from temporary result file when possible by @jgreer013 in #1734
- [tiny] Fix gradient checkpointing for Oumi trainer by @wizeng23 in #1778
- [tiny] Remove
use_ligerargument by @wizeng23 in #1779 - Judge API V2 | Merge Judge and Inference configs by @kaisopos in #1776
Full Changelog: v0.1.14...v0.2.0