Hanz Cuevas Velasquez1*, Anastasios Yiannakidis1*, Soyong Shin2, Giorgio Becherini1, Markus Höschle1, Joachim Tesch1, Taylor Obersat1, Tsvetelina Alexiadis1, Eni Halilaj2, Michael J. Black1
1Max Planck Institute for Intelligent Systems, Tübingen 2Carnegie Mellon University
*Equal contribution
[CVPR 2026 Oral] | arXiv | Project Page | Datasets
- [2026-06] 🎉 MAMMA being presented at CVPR 2026
- [2026-06] Code released (inference + training)
git clone https://github.com/cuevhv/mamma.git
cd mammaFull env + CUDA + weights setup: docs/INSTALL.md.
micromamba activate mamma # or: conda activate mamma
python -m inference doctor # verify env vars + weight pathsThe pipeline is zero-config when weights live under data/.
Bundled 4-cam example, ~56 MB:
bash data/download_example.sh # fetches videos to data/mamma_example/python -m inference run \
--cfg configs/examples/presets/quick.yaml \
--footage data/mamma_example \
--seq_name pushing_and_lifting_from_ground \
--calib configs/examples/calib/iphones_outdoors.yaml \
--out-tag demo -vOutputs land under output/ma_*/demo/mamma_example/….
Prefer a browser UI? Run bash gui/scripts/dev.sh, open http://localhost:3000, and click Run demo. It's the same pipeline but friendlier UX!
ma_cap → ma_masks → ma_2d → ma_3d → ma_vis
| Step | What it does |
|---|---|
ma_cap |
Loads multi-view capture |
ma_masks |
Per-person segmentation (SAM + YOLO) |
ma_2d |
2D landmark detection (MammaNet) |
ma_3d |
Multi-view SMPL-X optimization |
ma_vis |
Per-camera overlays + interactive scene |
Entry point: python -m inference run (source: inference/cli/run.py).
| Argument | What it is |
|---|---|
--cfg / --preset |
Pipeline-configuration YAML — declares which steps run and their hyperparameters. Capture-independent. (what a preset is + how to modify one) |
--footage |
Dataset root containing sequence subdirs (use with --seq_name + --calib). (layout reference) |
--seq_name |
One sequence subdirectory name under --footage to process (one run = one sequence). |
--calib |
Calibration file (.yaml / .xcp / OpenCV .json); applies to every sequence under --footage. (format reference) |
--capture |
Advanced: capture JSON pointing at footage, calibration, sequences, and camera names — used to iterate over many sequences in one invocation. (schema reference) |
--out-tag |
Output sub-directory tag under output/ma_*/<tag>/ (default: local). |
-v |
Verbose runner logs. |
Three things are needed:
- A calibration file (how to make one)
- A folder with your sequence (how to set it up)
- A preset — use a shipped one:
configs/examples/presets/quick.yaml(~5 min smoke) orconfigs/examples/presets/full.yaml(full-frame). Seedocs/CONFIGS.mdto modify or author your own.
Then:
python -m inference run \
--cfg <path/to/preset>.yaml \
--footage <path/to/footage> \
--seq_name <seq_name> \
--calib <path/to/calib>.yaml \
--out-tag run01 -vAlternative — iterate over many sequences in one invocation. A capture JSON enumerates sequences, cameras, and the calibration in one file; the runner walks them automatically:
python -m inference run \
--cfg <path/to/preset>.yaml \
--capture <path/to/capture>.json \
--out-tag run01 -vBrowser UI for submitting and inspecting runs. It uses the same mamma python env.
gui/scripts/dev.sh # dev: Flask :8000 + Vite :3000 (auto-reload)
gui/scripts/prod.sh # prod: single Flask process on :8000Setup and deployment: gui/README.md.
The paper's released captures, evaluation data, and synthetic training data live on the MAMMA project page and require a free account.
-
Register at https://mamma.is.tue.mpg.de/ and confirm your email.
-
Either use the GUI's Pipeline assets panel (sign in once, click to download), or run the per-dataset shell scripts under
data/:bash data/download_mamma_dance.sh --bachata --meta --pred --videos_crf24
Five dataset families ship: dance, multi-person, iPhone, eval, and synthetic. Per-dataset sizes, video encodings, and the full script flag surface live in docs/DATASETS.md.
Just running on your own footage? You don't need any of this — see Run the pipeline above.
.
├── inference/ runner, step builders, doctor CLI
├── capture/ ma_cap step
├── segmentation/ ma_masks step
├── landmarks/ ma_2d step
├── optimization/ ma_3d step
├── visualization/ ma_vis step
├── configs/ presets + capture manifests
├── data/ body models + weights + datasets (gitignored)
├── output/ run outputs (gitignored)
├── gui/ browser UI (Flask + React)
└── scripts/ smoke tests + utilities
- Release the evaluation scripts (2D landmark + benchmark evaluation) and the processed evaluation datasets.
@inproceedings{cuevas2026mamma,
title = {{MAMMA}: {Markerless Accurate Multi-person Motion Acquisition}},
author = {Cuevas Velasquez, Hanz and Yiannakidis, Anastasios and Shin, Soyong and Becherini, Giorgio and H{\"o}schle, Markus and Tesch, Joachim and Obersat, Taylor and Alexiadis, Tsvetelina and Halilaj, Eni and Black, Michael J.},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}
MAMMA builds on a number of open-source models, datasets, and tools. We thank their authors for releasing their work openly.
- SAM 2 and SAM 3: Meta FAIR; segmentation backbones.
- YOLOv12: Ultralytics; person detection.
- SMPL-X: MPI-IS; expressive body model.
- ViTPose: pretrained human-pose backbone.
- HRNet: alternative pose-estimation backbone.
- CameraHMR: MPI-IS; landmark architecture baseline.
- BEDLAM: MPI-IS; synthetic dataset of humans in motion.
- Rerun: interactive 3D scene viewer.
- Detectron2: Meta FAIR; vision research library.
- PyTorch Lightning, Hydra, and WebDataset: training infrastructure.
For non-commercial scientific research purposes LICENSE.
Questions, bug reports, or other inquiries: mamma@tue.mpg.de.