Skip to content
View xljh0520's full-sized avatar

Block or report xljh0520

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 7,253 420 Updated Dec 14, 2025

HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

Python 2,586 123 Updated Oct 31, 2025

A unified inference and post-training framework for accelerated video generation.

Python 2,824 226 Updated Dec 17, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,236 1,443 Updated Nov 28, 2025

Inference code for DWCode

Python 33 4 Updated Oct 24, 2023

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 6,413 358 Updated Nov 11, 2025

(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Python 1,100 63 Updated Aug 7, 2025

Code for "Diffusion Model Alignment Using Direct Preference Optimization"

Python 628 44 Updated Nov 10, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,171 2,681 Updated Aug 12, 2024

A generative world for general-purpose robotics & embodied AI learning.

Python 27,807 2,567 Updated Dec 17, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,863 276 Updated Sep 25, 2025
Python 571 32 Updated Nov 26, 2024

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…

Jupyter Notebook 8,559 551 Updated Nov 10, 2025

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 11,466 1,151 Updated Nov 21, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,642 2,232 Updated Feb 1, 2025

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 578 36 Updated Oct 20, 2024

VisionLLM Series

Python 1,131 59 Updated Feb 27, 2025

Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

Python 3,460 274 Updated Nov 19, 2025

GLM-4-Voice | 端到端中英语音对话模型

Python 3,100 268 Updated Dec 5, 2024

The best OSS video generation models, created by Genmo

Python 3,537 468 Updated Nov 14, 2025

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Jupyter Notebook 4,287 366 Updated Dec 4, 2025

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Python 1,496 87 Updated Jul 20, 2024

TorchCFM: a Conditional Flow Matching library

Python 2,181 176 Updated Nov 11, 2025

[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling

Python 3,142 306 Updated Dec 21, 2024

[CVPR2024] SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

Python 605 49 Updated Dec 16, 2025

Multimodal Models in Real World

Jupyter Notebook 551 23 Updated Feb 24, 2025

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 2,068 116 Updated Jul 29, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 1,283 65 Updated Dec 4, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 18,035 2,284 Updated Dec 25, 2024
Next