-
Facebook AI Research (FAIR)
- Menlo Park
- rongjiehuang.github.io
Stars
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Official implementation of "HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment"
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
[ACM MM 2025] FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
Scalable and memory-optimized training of diffusion models
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
the dataset and code for "Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset"
Reference PyTorch implementation and models for DINOv3
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
[CVPR 2025] Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
Enjoy the magic of Diffusion models!
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
[ICCV 2025] Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
Wan: Open and Advanced Large-Scale Video Generative Models
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
🔥 Motion Anything: Any to Motion Generation
HumanML3D: A large and diverse 3d human motion-language dataset.
MotionGPT3: Human Motion as a Second Modality, a MoT-based framework for unified motion understanding and generation
The open source code for SimpleSpeech series
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.
[ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"