- Palo Alto
- ificl.github.io
Lists (1)
Sort Name ascending (A-Z)
Stars
Public release of the Sound Effect Foundation model by Sony AI.
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Wan: Open and Advanced Large-Scale Video Generative Models
Text-audio foundation model from Boson AI
[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
Generative Omnimatte (CVPR 2025)
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
Pusa: Thousands Timesteps Video Diffusion Model
ICCV 2025 ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).
[ICML 2025] Gaussian Mixture Flow Matching Models (GMFlow)
[ICLR 2026] Official implementation of JavisDiT and JavisDiT++ series.
[CVPR 2025 GMCV] Test-Time Frequency Scaling: Instant Frequency Control for Any Diffusion Model
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
[NeurIPS 2024] AV-Cloud: Spatial Audio Rendering Through Audio-Visual Cloud Splatting
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
ICML2025, I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
tulip-berkeley / open_clip
Forked from mlfoundations/open_clipAn open source implementation of CLIP (With TULIP Support)
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Official implementation of Inductive Moment Matching
Implementation of VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation (CVPR'25)