Skip to content

guang-yng/legato

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

34 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR

Python 3.12 PyTorch 2.6.0 License: MIT

Official repository for the paper:
"LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR"

๐Ÿ› ๏ธ Setup Instructions

1. Create Environment

conda create -n legato python=3.12
conda activate legato

2. Install Dependencies

pip install -r requirements.txt

โš ๏ธ Tested with CUDA 12.4.

๐Ÿ“ฆ Pretrained Checkpoints

We release two LEGATO checkpoints on HuggingFace:

Model Link
legato-small guangyangmusic/legato-small
legato guangyangmusic/legato

๐Ÿ”น Recommended: Use legato (full model). The small variant is mainly for baseline comparisons and is less efficient.

๐Ÿ” Inference

Run Inference on a Single Image

PYTHONPATH=. python scripts/inference.py \
    --model_path guangyangmusic/legato \
    --image_path path/to/image.png

Batch Inference on a Folder

PYTHONPATH=. python scripts/inference.py \
    --model_path guangyangmusic/legato \
    --image_path path/to/image_folder/

๐Ÿ–ผ๏ธ Image folder should contain only .jpg, .jpeg, or .png files.

Inference from datasets.Dataset

Set image_path to the folder containing the dataset. Ensure the dataset has a column named image with score images.

Half Precision Inference

Use --fp16 flag to enable half-precision inference. This reduces memory usage but may impact performance.

๐ŸŽฏ Training & Validation

We use legato in examples below. Refer to the Checkpoints section for pretrained models.

The commands in this section have only been tested on a single node with multiple GPUs.

๐Ÿ”ฅ Training

Use the provided script:

bash scripts/train_legato.sh

Or customize:

PYTHONPATH=. accelerate launch --config_file configs/zero2.yaml \
    scripts/train.py \
    --model_config guangyangmusic/legato \
    --dataset_path datasets/PDMX-Synth \
    --output_dir outputs/legato \
    --remove_unused_columns False \
    --do_train --do_eval \
    --metric_for_best_model eval_SER --greater_is_better False \
    --save_steps 5000 --eval_steps 5000 \
    --num_train_epochs 10 \
    --learning_rate 3e-4 \
    --per_device_train_batch_size 2

Refer to TrainingArguments docs for more options.

LEGATO uses DeepSpeed ZeRO-2 by default. You can modify or provide your own Accelerate config (configs/*.yaml).

โœ… Validation

Validate the best checkpoint:

PYTHONPATH=. accelerate launch --config_file configs/inference.yaml \
    scripts/train.py configs/legato-eval.json

You can also:

  • Evaluate specific local checkpoints
  • Evaluate from HuggingFace
  • Evaluate all checkpoints

See scripts/eval_legato.sh for examples.

๐Ÿ”ฎ Prediction

To predict using multiple GPUs:

accelerate launch --config_file configs/inference.yaml \
    scripts/train.py configs/legato-predict.json

๐Ÿ”„ Output saved as test_predictions.json in the output directory.

๐Ÿ“‰ If transcription is present in the dataset, error metrics will be computed automatically.

๐Ÿ” MusicXML Conversion & Evaluation

๐Ÿ“ ABC Error Rate Evaluation

To evaluate the ABC transcription accuracy of your model predictions, use the provided script:

PYTHONPATH=. python scripts/compute_ER.py \
    --prediction_file path/to/test_predictions.json \
    --ground_truth datasets/PDMX-Synth

๐ŸŽผ ABC to MusicXML Conversion

Convert ABC predictions to MusicXML using:

DISPLAY=:0 python utils/convert.py --input_file xxx_abc.json

Requirements:

  • MuseScore executable at software/mscore
  • GUI-enabled environment (DISPLAY=:0)
  • Depends on utils/abc2xml.py

๐ŸŒฒ TEDn Evaluation

Compute Tree Edit Distance with <note> flattening (TEDn):

PYTHONPATH=. python scripts/compute_TEDn.py \
    --prediction_file xxx_xml.json \
    --ground_truth path/to/dataset \
    --num_workers 4

Dataset must contain a musicxml column.

Parts of utils/TEDn_eval are adapted from OLiMPiC, licensed under the MIT License.

๐ŸŒฒ TEDn Convert Evaluation

Compute TEDn scores only for samples that successfully convert from ABC to MusicXML:

PYTHONPATH=. python scripts/compute_TEDn_convert.py \
    --tedn_file xxx_ted_scores.json \
    --fail_mask xxx_fail_mask.json

This tool filters TEDn scores using a boolean mask indicating which samples failed kern-to-MusicXML conversion.

๐ŸŽต OMR-NED Evaluation

Compute Optical Music Recognition Normalized Edit Distance (OMR-NED) using the musicdiff library:

PYTHONPATH=. python scripts/compute_OMR-NED.py \
    --prediction_file xxx_xml.json \
    --ground_truth path/to/dataset

This evaluation:

  • Creates temporary folders for predictions and ground truth MusicXML files
  • Runs musicdiff evaluation with --ml_training_evaluation mode
  • Provides detailed error analysis and normalized edit distance metrics
  • Saves results to an output folder with comprehensive evaluation reports

Requirements:

  • musicdiff library installed (pip install musicdiff)
  • Dataset must contain a musicxml column

๐Ÿ“„ Citation

@misc{yang2025legatolargescaleendtoendgeneralizable,
      title={LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR}, 
      author={Guang Yang and Victoria Ebert and Nazif Tamer and Brian Siyuan Zheng and Luiza Pozzobon and Noah A. Smith},
      year={2025},
      eprint={2506.19065},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.19065}, 
}

About

Official codebase for paper "LEGATO: Large-Scale End-to-End Generalizable Approach to Typeset OMR".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors