Skip to content
View YHWH666's full-sized avatar

Block or report YHWH666

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Image<->Text

19 repositories

[ICLR2024] Official repo for paper "PnP Inversion: Boosting Diffusion-based Editing with 3 Lines of Code"

Jupyter Notebook 373 19 Updated Mar 12, 2024

Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch

Python 537 25 Updated Dec 8, 2023

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,191 2,684 Updated Aug 12, 2024
Jupyter Notebook 38 3 Updated Jul 11, 2022

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701

Python 935 43 Updated Sep 27, 2024

[CVPR 2024] Official implementation, Inversion-Free Image Editing with Natural Language"

Python 353 9 Updated May 28, 2024

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 11,073 1,088 Updated Nov 18, 2024

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Python 763 58 Updated Feb 1, 2024

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

86 2 Updated Sep 12, 2024

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Python 837 43 Updated Aug 19, 2025

Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.

Python 16 3 Updated Dec 19, 2023

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Python 4,823 307 Updated Mar 7, 2025

[CVPR 2024 Highlight] Official repo: SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

53 1 Updated Apr 5, 2024

The official Pytorch Implementation for ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation (CVPR 2024)

Python 159 8 Updated Dec 24, 2024
Python 35 6 Updated Dec 16, 2025
Python 88 6 Updated Sep 17, 2023
Python 48 4 Updated Jul 17, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,854 303 Updated Jun 12, 2025