Foundation machine learning interatomic potentials (MLIPs), trained on extensive databases containing millions of density functional theory (DFT) calculations, have revolutionized molecular and materials modeling, but existing benchmarks suffer from data leakage, limited transferability, and an over-reliance on error-based metrics tied to specific density functional theory (DFT) references.
We introduce MLIP Arena, a unified benchmark platform for evaluating foundation MLIP performance beyond conventional error metrics. It focuses on revealing the physical soundness learned by MLIPs and assessing their utilitarian performance agnostic to underlying model architecture and training dataset.
By moving beyond static DFT references and revealing the important failure modes of current foundation MLIPs in real-world settings, MLIP Arena provides a reproducible framework to guide the next-generation MLIP development toward improved predictive accuracy and runtime efficiency while maintaining physical consistency.
MLIP Arena leverages modern pythonic workflow orchestrator π Prefect π to enable advanced task/flow chaining and caching.
Note
Contributions of new tasks through PRs are very welcome! If you're interested in joining the effort, please reach out to Yuan at cyrusyc@berkeley.edu. See project page for some outstanding tasks, or propose new feature requests in Discussion.
- [April 8, 2025] π MLIP Arena is accepted as an ICLR AI4Mat Spotlight! π Huge thanks to all co-authors for their contributions!
pip install mlip-arena
Caution
We strongly recommend clean build in a new virtual environment due to the compatibility issues between multiple popular MLIPs. We provide a single installation script using uv
for minimal package conflicts and fast installation!
Caution
To automatically download farichem OMat24 checkpoint, please make sure you have gained downloading access to their HuggingFace model repo (not dataset repo), and login locally on your machine through huggginface-cli login
(see HF hub authentication)
Linux
# (Optional) Install uv, way faster than pip, why not? :)
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
git clone https://github.com/atomind-ai/mlip-arena.git
cd mlip-arena
# One script uv pip installation
bash scripts/install.sh
Tip
Sometimes installing all compiled models takes all the available local storage. Optional pip flag --no-cache
could be uesed. uv cache clean
will be helpful too.
Mac
# (Optional) Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
# One script uv pip installation
bash scripts/install-macosx.sh
Arena provides a unified interface to run all the compiled MLIPs. This can be achieved simply by looping through MLIPEnum
:
from mlip_arena.models import MLIPEnum
from mlip_arena.tasks import MD
from mlip_arena.tasks.utils import get_calculator
from ase import units
from ase.build import bulk
atoms = bulk("Cu", "fcc", a=3.6) * (5, 5, 5)
results = []
for model in MLIPEnum:
result = MD(
atoms=atoms,
calculator=get_calculator(
model,
calculator_kwargs=dict(), # passing into calculator
dispersion=True,
dispersion_kwargs=dict(
damping='bj', xc='pbe', cutoff=40.0 * units.Bohr
), # passing into TorchDFTD3Calculator
), # compatible with custom ASE Calculator
ensemble="nve", # nvt, nvt available
dynamics="velocityverlet", # compatible with any ASE Dynamics objects and their class names
total_time=1e3, # 1 ps = 1e3 fs
time_step=2, # fs
)
results.append(result)
To run multiple benchmarks in parallel, add .submit
before the task function and wrap all the tasks into a flow to dispatch the tasks to worker for concurrent execution. See Prefect Doc on tasks and flow for more details.
...
from prefect import flow
@flow
def run_all_tasks:
futures = []
for model in MLIPEnum:
future = MD.submit(
atoms=atoms,
...
)
future.append(future)
return [f.result(raise_on_failure=False) for f in futures]
For a more practical example using HPC resources, please now refer to MD stability benchmark.
The implemented tasks are available under mlip_arena.tasks.<module>.run
or from mlip_arena.tasks import *
for convenient imports (currently doesn't work if phonopy is not installed).
- OPT: Structure optimization
- EOS: Equation of state (energy-volume scan)
- MD: Molecular dynamics with flexible dynamics (NVE, NVT, NPT) and temperature/pressure scheduling (annealing, shearing, etc)
- PHONON: Phonon calculation driven by phonopy
- NEB: Nudged elastic band
- NEB_FROM_ENDPOINTS: Nudge elastic band with convenient image interpolation (linear or IDPP)
- ELASTICITY: Elastic tensor calculation
Instruction for individual benchmark is provided in the README in each corresponding folder under /benchmark.
PRs are welcome. Please clone the repo and submit PRs with changes.
To make change to huggingface space, fetch large files from git lfs first and run streamlit:
git lfs fetch --all
git lfs pull
streamlit run serve/app.py
Note
Please reuse, extend, or chain the general tasks defined above and add new folder and script under /benchmark
If you have pretrained MLIP models that you would like to contribute to the MLIP Arena and show benchmark in real-time, there are two ways:
- Implement new ASE Calculator class in mlip_arena/models/externals.
- Name your class with awesome model name and add the same name to registry with metadata.
Caution
Remove unneccessary outputs under results
class attributes to avoid error for MD simulations. Please refer to CHGNet as an example.
- Inherit Hugging Face ModelHubMixin class to your awesome model class definition. We recommend PytorchModelHubMixin.
- Create a new Hugging Face Model repository and upload the model file using push_to_hub function.
- Follow the template to code the I/O interface for your model here.
- Update model registry with metadata
If you find the work useful, please consider citing the following:
@inproceedings{
chiang2025mlip,
title={{MLIP} Arena: Advancing Fairness and Transparency in Machine Learning Interatomic Potentials through an Open and Accessible Benchmark Platform},
author={Yuan Chiang and Tobias Kreiman and Elizabeth Weaver and Ishan Amin and Matthew Kuner and Christine Zhang and Aaron Kaplan and Daryl Chrzan and Samuel M Blau and Aditi S. Krishnapriyan and Mark Asta},
booktitle={AI for Accelerated Materials Design - ICLR 2025},
year={2025},
url={https://openreview.net/forum?id=ysKfIavYQE}
}