-
Zhejiang University
- Hangzhou
- xljh0520.github.io
Lists (3)
Sort Name ascending (A-Z)
Stars
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
A unified inference and post-training framework for accelerated video generation.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A generative world for general-purpose robotics & embodied AI learning.
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Janus-Series: Unified Multimodal Understanding and Generation Models
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
The best OSS video generation models, created by Genmo
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
TorchCFM: a Conditional Flow Matching library
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
[CVPR2024] SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…