LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR

Official repository for the paper:
"LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR"

🛠️ Setup Instructions

1. Create Environment

conda create -n legato python=3.12
conda activate legato

2. Install Dependencies

pip install -r requirements.txt

⚠️ Tested with CUDA 12.4.

📦 Pretrained Checkpoints

We release two LEGATO checkpoints on HuggingFace:

Model	Link
legato-small	guangyangmusic/legato-small
legato	guangyangmusic/legato

🔹 Recommended: Use legato (full model). The small variant is mainly for baseline comparisons and is less efficient.

🔍 Inference

Run Inference on a Single Image

PYTHONPATH=. python scripts/inference.py \
    --model_path guangyangmusic/legato \
    --image_path path/to/image.png

Batch Inference on a Folder

PYTHONPATH=. python scripts/inference.py \
    --model_path guangyangmusic/legato \
    --image_path path/to/image_folder/

🖼️ Image folder should contain only .jpg, .jpeg, or .png files.

Inference from datasets.Dataset

Set image_path to the folder containing the dataset. Ensure the dataset has a column named image with score images.

Half Precision Inference

Use --fp16 flag to enable half-precision inference. This reduces memory usage but may impact performance.

🎯 Training & Validation

We use legato in examples below. Refer to the Checkpoints section for pretrained models.

The commands in this section have only been tested on a single node with multiple GPUs.

🔥 Training

Use the provided script:

bash scripts/train_legato.sh

Or customize:

PYTHONPATH=. accelerate launch --config_file configs/zero2.yaml \
    scripts/train.py \
    --model_config guangyangmusic/legato \
    --dataset_path datasets/PDMX-Synth \
    --output_dir outputs/legato \
    --remove_unused_columns False \
    --do_train --do_eval \
    --metric_for_best_model eval_SER --greater_is_better False \
    --save_steps 5000 --eval_steps 5000 \
    --num_train_epochs 10 \
    --learning_rate 3e-4 \
    --per_device_train_batch_size 2

Refer to TrainingArguments docs for more options.

LEGATO uses DeepSpeed ZeRO-2 by default. You can modify or provide your own Accelerate config (configs/*.yaml).

✅ Validation

Validate the best checkpoint:

PYTHONPATH=. accelerate launch --config_file configs/inference.yaml \
    scripts/train.py configs/legato-eval.json

You can also:

Evaluate specific local checkpoints
Evaluate from HuggingFace
Evaluate all checkpoints

See scripts/eval_legato.sh for examples.

🔮 Prediction

To predict using multiple GPUs:

accelerate launch --config_file configs/inference.yaml \
    scripts/train.py configs/legato-predict.json

🔄 Output saved as test_predictions.json in the output directory.

📉 If transcription is present in the dataset, error metrics will be computed automatically.

🔁 MusicXML Conversion & Evaluation

📏 ABC Error Rate Evaluation

To evaluate the ABC transcription accuracy of your model predictions, use the provided script:

PYTHONPATH=. python scripts/compute_ER.py \
    --prediction_file path/to/test_predictions.json \
    --ground_truth datasets/PDMX-Synth

🎼 ABC to MusicXML Conversion

Convert ABC predictions to MusicXML using:

DISPLAY=:0 python utils/convert.py --input_file xxx_abc.json

Requirements:

MuseScore executable at software/mscore
GUI-enabled environment (DISPLAY=:0)
Depends on utils/abc2xml.py

🌲 TEDn Evaluation

Compute Tree Edit Distance with <note> flattening (TEDn):

PYTHONPATH=. python scripts/compute_TEDn.py \
    --prediction_file xxx_xml.json \
    --ground_truth path/to/dataset \
    --num_workers 4

Dataset must contain a musicxml column.

Parts of utils/TEDn_eval are adapted from OLiMPiC, licensed under the MIT License.

🌲 TEDn Convert Evaluation

Compute TEDn scores only for samples that successfully convert from ABC to MusicXML:

PYTHONPATH=. python scripts/compute_TEDn_convert.py \
    --tedn_file xxx_ted_scores.json \
    --fail_mask xxx_fail_mask.json

This tool filters TEDn scores using a boolean mask indicating which samples failed kern-to-MusicXML conversion.

🎵 OMR-NED Evaluation

Compute Optical Music Recognition Normalized Edit Distance (OMR-NED) using the musicdiff library:

PYTHONPATH=. python scripts/compute_OMR-NED.py \
    --prediction_file xxx_xml.json \
    --ground_truth path/to/dataset

This evaluation:

Creates temporary folders for predictions and ground truth MusicXML files
Runs musicdiff evaluation with --ml_training_evaluation mode
Provides detailed error analysis and normalized edit distance metrics
Saves results to an output folder with comprehensive evaluation reports

Requirements:

musicdiff library installed (pip install musicdiff)
Dataset must contain a musicxml column

📄 Citation

@misc{yang2025legatolargescaleendtoendgeneralizable,
      title={LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR}, 
      author={Guang Yang and Victoria Ebert and Nazif Tamer and Brian Siyuan Zheng and Luiza Pozzobon and Noah A. Smith},
      year={2025},
      eprint={2506.19065},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.19065}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR

🛠️ Setup Instructions

1. Create Environment

2. Install Dependencies

📦 Pretrained Checkpoints

🔍 Inference

Run Inference on a Single Image

Batch Inference on a Folder

Inference from datasets.Dataset

Half Precision Inference

🎯 Training & Validation

🔥 Training

✅ Validation

🔮 Prediction

🔁 MusicXML Conversion & Evaluation

📏 ABC Error Rate Evaluation

🎼 ABC to MusicXML Conversion

🌲 TEDn Evaluation

🌲 TEDn Convert Evaluation

🎵 OMR-NED Evaluation

📄 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
configs		configs
docs		docs
legato		legato
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR

🛠️ Setup Instructions

1. Create Environment

2. Install Dependencies

📦 Pretrained Checkpoints

🔍 Inference

Run Inference on a Single Image

Batch Inference on a Folder

Inference from datasets.Dataset

Half Precision Inference

🎯 Training & Validation

🔥 Training

✅ Validation

🔮 Prediction

🔁 MusicXML Conversion & Evaluation

📏 ABC Error Rate Evaluation

🎼 ABC to MusicXML Conversion

🌲 TEDn Evaluation

🌲 TEDn Convert Evaluation

🎵 OMR-NED Evaluation

📄 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages