Академия Google

Language-guided music recommendation for video via prompt analogies

D McKee, J Salamon, J Sivic… - Proceedings of the …, 2023 - openaccess.thecvf.com

… In Figure 5, we provide qualitative retrieval results for examples in YouTube8MMusicTextClips.
In the first example, both models retrieve tracks that match the style and beat of the input …

Сохранить Цитировать Цитируется: 24 Похожие статьи Все версии статьи (5) В виде HTML

[PDF] aau.dk

[PDF][PDF] A Multimodal Large Language Model for

A Lauridsen, J Mørk, J Olsen - 2024 - vbn.aau.dk

… We combined and shuffled the data from MusicCaps and YouTube8M-MusicTextClips
and used 80% for training, 10% for validation, and 10% for testing. The size of the different …

Сохранить Цитировать Похожие статьи В виде HTML

[PDF] openreview.net

BLAP: Bootstrapping Language-Audio Pre-training for Music Captioning

LA Lanzendörfer, C Pinkl, N Perraudin… - … NeurIPS 2024 Workshop … - openreview.net

… We provide additional examples on our sample page, including from Song Describer and
YouTube8M-MusicTextClips, together with their audio. We find that BLAP tends to generate …

Сохранить Цитировать Похожие статьи В виде HTML

[PDF] openreview.net

LLark: A Multimodal Instruction-Following Language Model for Music

JP Gardner, S Durand, D Stoller… - Forty-first International …, 2023 - openreview.net

… In addition to the existing captioning datasets (MusicCaps, YouTube8MMusicTextClips),
we generate captions for MusicNet, the only dataset in our study where note-level metadata is …

Сохранить Цитировать Цитируется: 12 Похожие статьи В виде HTML

[PDF] arxiv.org

Llark: A multimodal foundation model for music

J Gardner, S Durand, D Stoller, RM Bittner - arXiv preprint arXiv …, 2023 - arxiv.org

… In addition to the existing captioning datasets (MusicCaps, YouTube8M-MusicTextClips),
we generate captions for MusicNet, the only dataset in our study where note-level metadata is …

Сохранить Цитировать Цитируется: 27 Похожие статьи Все версии статьи (3) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Language-guided music recommendation for video via prompt analogies

[PDF][PDF] A Multimodal Large Language Model for

BLAP: Bootstrapping Language-Audio Pre-training for Music Captioning

LLark: A Multimodal Instruction-Following Language Model for Music

Llark: A multimodal foundation model for music