Image<->Text
[ICLR2024] Official repo for paper "PnP Inversion: Boosting Diffusion-based Editing with 3 Lines of Code"
Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
[CVPR 2024] Official implementation, Inversion-Free Image Editing with Natural Language"
LAVIS - A One-stop Library for Language-Vision Intelligence
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
[CVPR 2024 Highlight] Official repo: SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
The official Pytorch Implementation for ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation (CVPR 2024)
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.