Skip to content
View Rongjiehuang's full-sized avatar
🎯
Focusing. I may be slow to reply.
🎯
Focusing. I may be slow to reply.

Organizations

@AIGC-Audio

Block or report Rongjiehuang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 4,201 478 Updated Apr 15, 2025

Official implementation of "HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment"

Python 75 2 Updated Apr 15, 2025

An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation

Python 952 47 Updated Oct 4, 2025

Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.

Python 648 52 Updated Sep 24, 2025

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1,657 106 Updated Sep 16, 2025

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,724 274 Updated Jul 18, 2025

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

Python 1,235 156 Updated Oct 2, 2025

[ACM MM 2025] FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

Python 1,567 123 Updated Aug 20, 2025

Scalable and memory-optimized training of diffusion models

Python 1,283 136 Updated Jun 4, 2025

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 11,989 1,194 Updated Sep 7, 2025

the dataset and code for "Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset"

Python 403 77 Updated May 12, 2024

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 7,585 482 Updated Oct 3, 2025

Official implementation of BLIP3o-Series

Python 1,498 65 Updated Oct 3, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 18,750 1,834 Updated Oct 6, 2025

[CVPR 2025] Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer

Python 1,314 175 Updated Mar 13, 2025
Python 275 13 Updated Jul 29, 2025

Enjoy the magic of Diffusion models!

Python 10,272 962 Updated Sep 30, 2025

Official repository for LTX-Video

Python 8,256 741 Updated Jul 21, 2025

Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"

Python 166 7 Updated Feb 25, 2025

[ICCV 2025] Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Python 194 12 Updated Jun 26, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 9,852 986 Updated Sep 19, 2025

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,397 67 Updated Sep 18, 2025

🔥 Motion Anything: Any to Motion Generation

223 2 Updated Apr 11, 2025

HumanML3D: A large and diverse 3d human motion-language dataset.

Python 1,202 115 Updated Aug 18, 2024

MotionGPT3: Human Motion as a Second Modality, a MoT-based framework for unified motion understanding and generation

Python 106 7 Updated Sep 26, 2025

The open source code for SimpleSpeech series

Python 141 11 Updated Oct 8, 2024

[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

Python 1,047 61 Updated Sep 19, 2025

[ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"

Python 330 9 Updated Jun 27, 2025
Python 1,336 125 Updated Jan 8, 2025
Next