This is the repository for the fast and reliable Music Symbol detector with Deep Learning, based on the Tensorflow Object Detection API:
If you want to try out the full-page detection on your own images, you can try it online in the DIVAServices Spotlight.
The scientific reasoning can be found in this scientific article. The detailed results for various combinations of object-detector, feature-extractor, etc. can be found in this spreadsheet.
If you are interested in previous work, presented at the DAS 2018 on cropped images like these, please refer to the corresponding release
Original Image | Detection results as training progresses |
---|---|
The scientific reasoning can be found in this scientific article. The detailed results for various combinations of object-detector, feature-extractor, etc. can be found in this spreadsheet.
This repository contains several scripts that can be used independently of each other. Before running them, make sure that you have the necessary requirements installed.
- Python 3.6
- Tensorflow 1.8.0 (or optionally tensorflow-gpu 1.8.0)
- pycocotools (more infos)
- On Linux, run
pip install git+https://github.com/waleedka/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI
- On Windows, run
pip install git+https://github.com/philferriere/cocoapi.git#egg=pycocotools^&subdirectory=PythonAPI
- On Linux, run
- Some libraries, as specified in requirements.txt
cd research
protoc object_detection/protos/*.proto --python_out=.
Run
DownloadAndBuildProtocolBuffers.ps1
to automate this step or manually build the protobufs by first installing protocol buffers and then run:
cd research
protoc object_detection/protos/*.proto --python_out=.
Note, that you have to use version 3.4.0 because of a bug in 3.5.0 and 3.5.1
Run
PrepareDatasetsForTensorflow.ps1
to automate this step on Windows or manually prepare the datasets with the following steps (on Linux).
Run the following scripts to reproduce the dataset locally:
# cd into MusicObjectDetector folder
python download_muscima_dataset.py
python prepare_muscima_annotations.py
python dataset_splitter.py --source_directory=data/muscima_pp_cropped_images_with_stafflines --destination_directory=data/training_validation_test_with_stafflines
These scripts will download the datasets automatically, prepare the annotations and split the images into three reproducible parts for training, validation and test.
Now you can create the Tensorflow Records that are required for actually running the training.
python create_muscima_tf_record.py --data_dir=data/training_validation_test_with_stafflines --set=training --annotations_dir=Annotations --output_path=data/all_classes_with_staff_lines_writer_independent_split/training.record --label_map_path=mapping_all_classes.txt
python create_muscima_tf_record.py --data_dir=data/training_validation_test_with_stafflines --set=validation --annotations_dir=Annotations --output_path=data/all_classes_with_staff_lines_writer_independent_split/validation.record --label_map_path=mapping_all_classes.txt
python create_muscima_tf_record.py --data_dir=data/training_validation_test_with_stafflines --set=test --annotations_dir=Annotations --output_path=data/all_classes_with_staff_lines_writer_independent_split/test.record --label_map_path=mapping_all_classes.txt
By providing a different mapping, you can reduce the classes, you want to be able to detect, e.g. mapping_71_classes.txt
:
There are two ways of making sure, that the python script discoveres the correct binaries:
To permanently link the source-code of the project, for Python to be able to find it, you can link the two packages by running:
# From tensorflow/models/research/
pip install -e .
cd slim
# From inside tensorflow/models/research/slim
pip install -e .
Make sure you have all required folders appended to the Python path. This can temporarily be done inside a shell, before calling any training scrips by the following commands:
For Linux:
# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
For Windows (Powershell):
$pathToGitRoot = "[GIT_ROOT]"
$pathToSourceRoot = "$($pathToGitRoot)/object_detection"
$env:PYTHONPATH = "$($pathToGitRoot);$($pathToSourceRoot);$($pathToGitRoot)/slim"
For running the training, you need to change the paths, according to your system
- in the configuration, you want to run, e.g.
configurations/faster_rcnn_inception_resnet_v2_atrous_muscima_pretrained_reduced_classes.config
- if you use them, in the PowerShell scripts in the
training_scripts
folder.
Run the actual training script, by using the pre-defined Powershell scripts in the training_scripts
folder, or by directly calling
# Start the training
python [GIT_ROOT]/research/object_detection/train.py --logtostderr --pipeline_config_path="[GIT_ROOT]/MusicObjectDetector/configurations/[SELECTED_CONFIG].config" --train_dir="[GIT_ROOT]/MusicObjectDetector/data/checkpoints-[SELECTED_CONFIG]-train"
# Start the validation
python [GIT_ROOT]/research/object_detection/eval.py --logtostderr --pipeline_config_path="[GIT_ROOT]/MusicObjectDetector/configurations/[SELECTED_CONFIG].config" --checkpoint_dir="[GIT_ROOT]/MusicObjectDetector/data/checkpoints-[SELECTED_CONFIG]-train" --eval_dir="[GIT_ROOT]/MusicObjectDetector/data/checkpoints-[SELECTED_CONFIG]-validate"
A few remarks: The two scripts can and should be run at the same time, to get a live evaluation during the training. The values, may be visualized by calling tensorboard --logdir=[GIT_ROOT]/MusicObjectDetector/data
.
Notice that usually Tensorflow allocates the entire memory of your graphics card for the training. In order to run both training and validation at the same time, you might have to restrict Tensorflow from doing so, by opening train.py
and eval.py
and uncomment the respective (prepared) lines in the main function. E.g.:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
It is recommended that you use pre-trained weights for known networks to speed up training and improve overall results. To do so, head over to the Tensorflow detection model zoo, download and unzip the respective trained model, e.g. faster_rcnn_inception_resnet_v2_atrous_coco
for reproducing the best results, we obtained. The path to the unzipped files, must be specified inside of the configuration in the train_config
-section, e.g.
train-config: {
fine_tune_checkpoint: "C:/Users/Alex/Repositories/MusicObjectDetector-TF/MusicObjectDetector/data/faster_rcnn_inception_resnet_v2_atrous_coco_2017_11_08/model.ckpt"
from_detection_checkpoint: true
}
Note that inside that folder, there is no actual file, called
model.ckpt
, but multiple files calledmodel.ckpt.[something]
.
For optimizing the performance of the detector, we adopted the dimensions clustering algorithm, proposed in the YOLO 9000 paper.
While preparing the dataset, the muscima_image_cutter.py
script created a file called Annotations.csv
and a folder called Annotations
.
Both will contain the same annotations, but in different formats. While the csv-file contains all annotations in a plain list, the Annotations
folder contains one xml-file per image, complying with the format used for the Pascal VOC project.
To perform dimension clustering on the cropped images, run the following scripts:
python generate_muscima_statistics.py
python muscima_dimension_clustering.py
The first script will load all annotations and create four csv-files containing the dimensions for each annotation from all images, including their relative sizes, compared to the entire image. The second script loads those statistics and performs dimension clustering, use a k-means algorithm on the relative dimensions of annotations.
We recommend to check out the demo folder first, which provides a self-contained script for performing object detection and does not depend on this library. It comes with a pre-trained model for convenience and a simple text output for interoperability with other applications.
If you have trained a model by yourself, this document describes how to prepare it. Basically, you just run export_inference_graph.py
with appropriate arguments or freeze_model.ps1
after setting the paths accordingly. Alternatively, a pre-trained model can be download from here: 2018-05-15_faster-rcnn_inception-resnet-v2_2000-proposals_full-page-detection_muscima-pp.pb.
Once you have the frozen model, you can perform inference on a single image by running
# From [GIT_ROOT]/MusicObjectDetection
python inference_over_image.py \
--inference_graph ${frozen_inference_graph.pb} \
--label_map mapping.txt \
--input_image ${IMAGE_TO_BE_CLASSIFIED} \
--output_image image_with_detection.jpg
or for an entire directory of images by running
# From [GIT_ROOT]/MusicObjectDetection
python inference_over_directory.py \
--inference_graph ${frozen_inference_graph.pb} \
--label_map mapping.txt \
--input_directory ${DIRECTORY_TO_IMAGES} \
--output_directory ${OUTPUT_DIRECTORY}
Published under MIT License,
Copyright (c) 2018 Alexander Pacha, TU Wien and Kwon-Young Choi
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.