Skip to content
View LiYinqi's full-sized avatar
  • 22:35 (UTC +08:00)

Highlights

  • Pro

Block or report LiYinqi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.

2,503 110 Updated Apr 8, 2026

The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Ins…

443 49 Updated Sep 25, 2025

Awesome Unified Multimodal Models

1,181 38 Updated Mar 24, 2026

Automatically crawl arXiv papers daily and summarize them using AI. Illustrating them using GitHub Pages.

JavaScript 2,546 913 Updated Apr 9, 2026

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,414 42 Updated Mar 9, 2026

[CVPR 2026] Beyond Generation: Advancing Image Editing Priors for Depth and Normal Estimation

Python 222 8 Updated Mar 31, 2026

[ICCV 2025] Code & Data for: SuperEdit - Rectifying and Facilitating Supervision for Instruction-Based Image Editing

Python 164 9 Updated Jun 26, 2025

[NeurIPS 2025] Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Surpasses GPT-4o in ID persistence~ MoE ckpt released! Only 4GB VRAM is enough to run!

Python 2,090 115 Updated Dec 19, 2025

VSCode extension that grammar-checks texts through a local LLM

TypeScript 26 6 Updated Oct 30, 2025

Official implement of ICML2024 Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Python 57 3 Updated Aug 15, 2024

[ICLR 2025] Diffusion Feedback Helps CLIP See Better

Python 301 14 Updated Jan 23, 2025

[NeurIPS'25] A work to improve CLIP's visual detail capturing ability by inverting the unCLIP generative model.

Python 23 Updated Mar 19, 2026

The official implementation of CVPR Workshop 2025 paper: Window Token Concatenation for Efficient Visual Large Language Models.

Python 10 Updated Apr 10, 2025

(ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.

Python 77 4 Updated Jun 25, 2025

This repository collects papers on VLLM applications. We will update new papers irregularly.

212 16 Updated Feb 23, 2026

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 4,403 375 Updated Oct 19, 2025

Diffusion Classifier leverages pretrained diffusion models to perform zero-shot classification without additional training

Python 486 45 Updated Feb 28, 2024

[ICLR'25 Oral] No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

Python 945 50 Updated Feb 25, 2026

LaTeX Thesis Template for the University of Chinese Academy of Sciences

TeX 3,812 943 Updated Feb 29, 2024
Python 1,482 146 Updated Jan 8, 2025

Official implementation of NeurIPS'24 paper Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features

Python 38 5 Updated May 28, 2025

Utilities intended for use with Llama models.

Python 7,549 1,355 Updated Feb 11, 2026

Next-Token Prediction is All You Need

Python 2,393 95 Updated Jan 12, 2026

Collection of common code that's shared among different research projects in FAIR computer vision team.

Python 2,233 237 Updated Mar 15, 2026

High-resolution models for human tasks.

Python 5,319 316 Updated Nov 18, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 18,896 2,421 Updated Apr 7, 2026

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,891 120 Updated Feb 20, 2026

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,992 137 Updated Nov 7, 2025

This repo contains the code for 1D tokenizer and generator

Jupyter Notebook 1,140 67 Updated Mar 20, 2025

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 1,002 44 Updated Nov 25, 2025
Next