!wget --no-check-certificate -O LAKH-MuseNet-MIDI-Dataset.zip "https://onedrive.live.com/download?cid=8A0D502FC99C608F&resid=8A0D502FC99C608F%2118520&authkey=AN-gn1ZxEnO4khE"
The Lakh MIDI Dataset is distributed with a CC-BY 4.0 license; if you use this data in any capacity, please reference this page and my thesis:
Colin Raffel. "Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching". PhD Thesis, 2016.
Of course, I did not transcribe any of the MIDI files in the Lakh MIDI Dataset. While MIDI files have a built-in mechanism for attribution (the Copyright meta-event), it is not used consistently, so attributing each of the MIDI files in the dataset to a particular author is not feasible. If you'd like to try, here is a list of the text of all of the Copyright meta-events in the Lakh MIDI Dataset.
If you use the Million Song Dataset, please reference this paper:
Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. "The Million Song Dataset". In Proceedings of the 12th International Society for Music Information Retrieval Conference, pages 591–596, 2011.