Official repository for the paper:
"LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR"
conda create -n legato python=3.12
conda activate legatopip install -r requirements.txt
โ ๏ธ Tested with CUDA 12.4.
We release two LEGATO checkpoints on HuggingFace:
| Model | Link |
|---|---|
| legato-small | guangyangmusic/legato-small |
| legato | guangyangmusic/legato |
๐น Recommended: Use
legato(full model). The small variant is mainly for baseline comparisons and is less efficient.
PYTHONPATH=. python scripts/inference.py \
--model_path guangyangmusic/legato \
--image_path path/to/image.pngPYTHONPATH=. python scripts/inference.py \
--model_path guangyangmusic/legato \
--image_path path/to/image_folder/๐ผ๏ธ Image folder should contain only .jpg, .jpeg, or .png files.
Set image_path to the folder containing the dataset. Ensure the dataset has a column named image with score images.
Use --fp16 flag to enable half-precision inference. This reduces memory usage but may impact performance.
We use legato in examples below. Refer to the Checkpoints section for pretrained models.
The commands in this section have only been tested on a single node with multiple GPUs.
Use the provided script:
bash scripts/train_legato.shOr customize:
PYTHONPATH=. accelerate launch --config_file configs/zero2.yaml \
scripts/train.py \
--model_config guangyangmusic/legato \
--dataset_path datasets/PDMX-Synth \
--output_dir outputs/legato \
--remove_unused_columns False \
--do_train --do_eval \
--metric_for_best_model eval_SER --greater_is_better False \
--save_steps 5000 --eval_steps 5000 \
--num_train_epochs 10 \
--learning_rate 3e-4 \
--per_device_train_batch_size 2Refer to TrainingArguments docs for more options.
LEGATO uses DeepSpeed ZeRO-2 by default. You can modify or provide your own Accelerate config (configs/*.yaml).
Validate the best checkpoint:
PYTHONPATH=. accelerate launch --config_file configs/inference.yaml \
scripts/train.py configs/legato-eval.jsonYou can also:
- Evaluate specific local checkpoints
- Evaluate from HuggingFace
- Evaluate all checkpoints
See scripts/eval_legato.sh for examples.
To predict using multiple GPUs:
accelerate launch --config_file configs/inference.yaml \
scripts/train.py configs/legato-predict.json
๐ Output saved as test_predictions.json in the output directory.
๐ If transcription is present in the dataset, error metrics will be computed automatically.
To evaluate the ABC transcription accuracy of your model predictions, use the provided script:
PYTHONPATH=. python scripts/compute_ER.py \
--prediction_file path/to/test_predictions.json \
--ground_truth datasets/PDMX-SynthConvert ABC predictions to MusicXML using:
DISPLAY=:0 python utils/convert.py --input_file xxx_abc.jsonRequirements:
- MuseScore executable at
software/mscore - GUI-enabled environment (
DISPLAY=:0) - Depends on
utils/abc2xml.py
Compute Tree Edit Distance with <note> flattening (TEDn):
PYTHONPATH=. python scripts/compute_TEDn.py \
--prediction_file xxx_xml.json \
--ground_truth path/to/dataset \
--num_workers 4Dataset must contain a musicxml column.
Parts of
utils/TEDn_evalare adapted from OLiMPiC, licensed under the MIT License.
Compute TEDn scores only for samples that successfully convert from ABC to MusicXML:
PYTHONPATH=. python scripts/compute_TEDn_convert.py \
--tedn_file xxx_ted_scores.json \
--fail_mask xxx_fail_mask.jsonThis tool filters TEDn scores using a boolean mask indicating which samples failed kern-to-MusicXML conversion.
Compute Optical Music Recognition Normalized Edit Distance (OMR-NED) using the musicdiff library:
PYTHONPATH=. python scripts/compute_OMR-NED.py \
--prediction_file xxx_xml.json \
--ground_truth path/to/datasetThis evaluation:
- Creates temporary folders for predictions and ground truth MusicXML files
- Runs musicdiff evaluation with
--ml_training_evaluationmode - Provides detailed error analysis and normalized edit distance metrics
- Saves results to an output folder with comprehensive evaluation reports
Requirements:
musicdifflibrary installed (pip install musicdiff)- Dataset must contain a
musicxmlcolumn
@misc{yang2025legatolargescaleendtoendgeneralizable,
title={LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR},
author={Guang Yang and Victoria Ebert and Nazif Tamer and Brian Siyuan Zheng and Luiza Pozzobon and Noah A. Smith},
year={2025},
eprint={2506.19065},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.19065},
}