Skip to content

A timeline of the latest AI models for audio generation, starting in 2023!

Notifications You must be signed in to change notification settings

archinetai/audio-ai-timeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 

Repository files navigation

Audio AI Timeline

Here we will keep track of the latest AI models for waveform based audio generation, starting in 2023!

2023

Date Release [Samples] Paper Code Trained Model
14.11 Mustango: Toward Controllable Text-to-Music Generation arXiv GitHub Hugging Face
13.11 Music ControlNet: Multiple Time-varying Controls for Music Generation arXiv - -
02.11 E3 TTS: Easy End-to-End Diffusion-based Text to Speech arXiv - -
01.10 UniAudio: An Audio Foundation Model Toward Universal Audio Generation arXiv GitHub -
24.09 VoiceLDM: Text-to-Speech with Environmental Context arXiv GitHub -
05.09 PromptTTS 2: Describing and Generating Voices with Text Prompt arXiv - -
14.08 SpeechX: Neural Codec Language Model as a Versatile Speech Transformer arXiv - -
10.08 AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining arXiv GitHub Hugging Face
09.08 JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models arXiv - -
03.08 MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies arXiv GitHub -
14.07 Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts arXiv - -
10.07 VampNet: Music Generation via Masked Acoustic Token Modeling arXiv GitHub -
22.06 AudioPaLM: A Large Language Model That Can Speak and Listen arXiv - -
19.06 Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale PDF GitHub -
08.06 MusicGen: Simple and Controllable Music Generation arXiv GitHub Hugging Face Colab
06.06 Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias arXiv - -
01.06 Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis arXiv GitHub -
29.05 Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation arXiv - -
25.05 MeLoDy: Efficient Neural Music Generation arXiv - -
18.05 CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training arXiv - -
18.05 SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities arXiv GitHub -
16.05 SoundStorm: Efficient Parallel Audio Generation arXiv GitHub (unofficial) -
03.05 Diverse and Vivid Sound Generation from Text Descriptions arXiv - -
02.05 Long-Term Rhythmic Video Soundtracker arXiv GitHub -
24.04 TANGO: Text-to-Audio generation using instruction tuned LLM and Latent Diffusion Model PDF GitHub Hugging Face
18.04 NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers arXiv GitHub (unofficial) -
10.04 Bark: Text-Prompted Generative Audio Model - GitHub Hugging Face Colab
03.04 AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models arXiv - -
08.03 VALL-E X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling arXiv - -
27.02 I Hear Your True Colors: Image Guided Audio Generation arXiv GitHub -
08.02 Noise2Music: Text-conditioned Music Generation with Diffusion Models arXiv - -
04.02 Multi-Source Diffusion Models for Simultaneous Music Generation and Separation arXiv GitHub -
30.01 SingSong: Generating musical accompaniments from singing arXiv - -
30.01 AudioLDM: Text-to-Audio Generation with Latent Diffusion Models arXiv GitHub Hugging Face
30.01 Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion arXiv GitHub -
29.01 Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models PDF - -
28.01 Noise2Music - - -
27.01 RAVE2 [Samples RAVE1] arXiv GitHub -
26.01 MusicLM: Generating Music From Text arXiv GitHub (unofficial) -
18.01 Msanii: High Fidelity Music Synthesis on a Shoestring Budget arXiv GitHub Hugging Face Colab
16.01 ArchiSound: Audio Generation with Diffusion arXiv GitHub -
05.01 VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers arXiv GitHub (unofficial) (demo) -

About

A timeline of the latest AI models for audio generation, starting in 2023!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published