Skip to content
View FengLi-ust's full-sized avatar
🦂
🦂

Block or report FengLi-ust

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

QeRL enables RL for 32B LLMs on a single H100 GPU.

Python 470 46 Updated Nov 27, 2025

LongLive: Real-time Interactive Long Video Generation

Python 925 63 Updated Dec 4, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,112 338 Updated Dec 23, 2025

Awesome curated collection of images and prompts generated by gemini-2.5-flash-image (aka Nano Banana) state-of-the-art image generation and editing model. Explore AI generated visuals created with…

JavaScript 8,189 836 Updated Sep 8, 2025

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 680 27 Updated Dec 23, 2025

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

Python 678 25 Updated Sep 24, 2025

An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.

Python 445 14 Updated Dec 2, 2025

Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"

Python 422 22 Updated Jun 20, 2025

Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grained visual understanding".

107 2 Updated Aug 21, 2025

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,223 40 Updated Dec 23, 2025

WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning

Python 36 1 Updated Jun 10, 2025

Open-source unified multimodal model

Python 5,500 481 Updated Oct 27, 2025

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 24,082 2,063 Updated Sep 12, 2025

Witness the aha moment of VLM with less than $3.

Python 4,012 289 Updated May 19, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,773 376 Updated Oct 21, 2025

Official Repo for Open-Reasoner-Zero

Python 2,083 119 Updated Jun 2, 2025

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,880 152 Updated Oct 4, 2025

Paper collections of the continuous effort start from World Models.

191 6 Updated Jul 6, 2024
Python 187 4 Updated Dec 17, 2024

Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Python 209 8 Updated Oct 15, 2025

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

Jupyter Notebook 3,143 361 Updated Nov 11, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 18,100 2,292 Updated Dec 25, 2024

Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Python 1,071 43 Updated Jan 21, 2025
Python 4,463 434 Updated Sep 14, 2025

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,330 279 Updated May 4, 2024

[ECCV 2024 & NeurIPS 2024] Official implementation of the paper TAPTR & TAPTRv2 & TAPTRv3

269 13 Updated Dec 13, 2024

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,437 464 Updated Dec 18, 2025

[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

Python 298 17 Updated Jul 17, 2024

[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant

Python 246 12 Updated Aug 14, 2024

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

7,652 934 Updated Aug 21, 2024
Next