Stars
Official implementation of "Learn-to-Steer: Data-Driven Loss Functions for Inference-Time Optimization in Text-to-Image" (WACV 2026)
Wan: Open and Advanced Large-Scale Video Generative Models
Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs, LAION-CLAP, MS-CLAP, DeSync
AblationBench is evaluation framework for language models on ablation planning in empricial AI research
[ISMIR 2025] A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.
Source for https://fullstackdeeplearning.com
Official implementation of "Single Image Iterative Subject-driven Generation and Editing".
ImageBind One Embedding Space to Bind Them All
[InterSpeech 2023] The official PyTorch implementation of: "AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation"