Language-guided music recommendation for video via prompt analogies

D McKee, J Salamon, J Sivic… - Proceedings of the …, 2023 - openaccess.thecvf.com
… In Figure 5, we provide qualitative retrieval results for examples in YouTube8MMusicTextClips.
In the first example, both models retrieve tracks that match the style and beat of the input …

[PDF][PDF] A Multimodal Large Language Model for

A Lauridsen, J Mørk, J Olsen - 2024 - vbn.aau.dk
… We combined and shuffled the data from MusicCaps and YouTube8M-MusicTextClips
and used 80% for training, 10% for validation, and 10% for testing. The size of the different …

BLAP: Bootstrapping Language-Audio Pre-training for Music Captioning

LA Lanzendörfer, C Pinkl, N Perraudin… - … NeurIPS 2024 Workshop … - openreview.net
… We provide additional examples on our sample page, including from Song Describer and
YouTube8M-MusicTextClips, together with their audio. We find that BLAP tends to generate …

LLark: A Multimodal Instruction-Following Language Model for Music

JP Gardner, S Durand, D Stoller… - Forty-first International …, 2023 - openreview.net
… In addition to the existing captioning datasets (MusicCaps, YouTube8MMusicTextClips),
we generate captions for MusicNet, the only dataset in our study where note-level metadata is …

Llark: A multimodal foundation model for music

J Gardner, S Durand, D Stoller, RM Bittner - arXiv preprint arXiv …, 2023 - arxiv.org
… In addition to the existing captioning datasets (MusicCaps, YouTube8M-MusicTextClips),
we generate captions for MusicNet, the only dataset in our study where note-level metadata is …