Skip to content
View IFICL's full-sized avatar

Block or report IFICL

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Public release of the Sound Effect Foundation model by Sony AI.

Python 319 22 Updated May 21, 2026

DACVAE

Python 224 17 Updated Dec 22, 2025

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,528 319 Updated May 26, 2026
HTML 175 9 Updated Oct 27, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,162 2,095 Updated Jun 9, 2026

Wan: Open and Advanced Large-Scale Video Generative Models

Python 16,213 2,010 Updated Mar 17, 2026

Text-audio foundation model from Boson AI

Python 8,194 629 Updated Jun 5, 2026

[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 2,945 491 Updated May 22, 2026

Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)

Python 3,397 274 Updated Sep 12, 2025

Generative Omnimatte (CVPR 2025)

Python 182 17 Updated Jun 3, 2025

SOTA Open Source TTS

Python 30,811 2,630 Updated Jun 9, 2026

[ICLR 2025] Autoregressive Video Generation without Vector Quantization

Python 651 22 Updated Oct 29, 2025

Pusa: Thousands Timesteps Video Diffusion Model

Python 683 46 Updated Feb 13, 2026

ICCV 2025 ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).

Python 458 54 Updated Aug 20, 2025

[ICML 2025] Gaussian Mixture Flow Matching Models (GMFlow)

Python 191 7 Updated Nov 7, 2025

[ICLR 2026] Official implementation of JavisDiT and JavisDiT++ series.

Python 370 31 Updated Mar 29, 2026
Python 11 1 Updated Sep 22, 2025

[CVPR 2025 GMCV] Test-Time Frequency Scaling: Instant Frequency Control for Any Diffusion Model

Python 55 2 Updated May 31, 2025

[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation

Jupyter Notebook 97 3 Updated Mar 1, 2025

[NeurIPS 2024] AV-Cloud: Spatial Audio Rendering Through Audio-Visual Cloud Splatting

Python 14 3 Updated Nov 22, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 4,022 324 Updated Jun 12, 2025

[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Python 1,012 77 Updated Jul 10, 2025

Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"

Python 33 1 Updated Mar 26, 2025

Official repo for CFG-Zero*

Python 704 26 Updated May 2, 2025

ICML2025, I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

Python 192 8 Updated Sep 7, 2025

An open source implementation of CLIP (With TULIP Support)

Python 165 3 Updated May 14, 2025

(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Python 1,365 82 Updated Aug 7, 2025

[ICLR 2026] Repository of AudioX

Python 1,524 140 Updated Mar 10, 2026

Official implementation of Inductive Moment Matching

Python 585 17 Updated Jul 11, 2025

Implementation of VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation (CVPR'25)

Python 14 2 Updated Jun 13, 2026
Next