Skip to content
View lzyhha's full-sized avatar
🍋
studying
🍋
studying

Block or report lzyhha

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official Code for See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding (CVPR 2026)

Python 95 Updated May 20, 2026

Official Repo of "D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models"

Python 251 7 Updated May 22, 2026
Python 526 13 Updated May 1, 2026

ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Understanding.

Python 67 2 Updated Mar 3, 2026

A comprehensive list of papers about Large-Language-Diffusion-Models.

82 13 Updated Jun 4, 2026

paper list, tutorial, and nano code snippet for Diffusion Large Language Models.

Jupyter Notebook 168 9 Updated Jan 19, 2026

dLLM: Simple Diffusion Language Modeling

Python 2,581 271 Updated Jun 12, 2026
Python 11,558 789 Updated Feb 9, 2026

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 33,874 7,063 Updated Jun 16, 2026

Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model

Python 999 61 Updated May 19, 2026
Python 55 Updated Sep 21, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,172 2,095 Updated Jun 9, 2026

Official Release of ICCV 2025 paper -- DiscretizedSDF

Python 109 12 Updated Aug 25, 2025
Python 793 24 Updated Jun 10, 2026

Official code for the paper: Depth Anything At Any Condition

Python 340 22 Updated Aug 21, 2025

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Python 122 3 Updated Jul 1, 2025

Open-source unified multimodal model

Python 6,017 533 Updated May 4, 2026

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,582 65 Updated Jun 14, 2025

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,953 91 Updated Jan 8, 2026

Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)

Python 15 Updated May 2, 2025

A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.

Python 2,225 101 Updated Apr 29, 2026
Python 119 3 Updated Apr 25, 2025

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

Python 1 Updated May 12, 2025

Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"

Python 33 Updated Apr 20, 2025

Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation.

Python 15 Updated Jul 21, 2025

[ICCV 2025] VisualCloze: A universal image generation framework that can support a wide range of in-domain tasks and generalize to unseen ones. (🔥 🔥 🔥 Merged into offical pipelines of diffusers.)

Python 282 14 Updated Jan 7, 2026

Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"

Python 308 15 Updated Apr 23, 2025

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

Python 1,084 51 Updated Nov 3, 2025

[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Python 176 12 Updated Sep 6, 2025
Next