FengLi-ust

🦂

Feng Li FengLi-ust

🦂

Ph.D. student @HKUST.

217 followers · 35 following

Achievements

Stars

NVlabs / QeRL

QeRL enables RL for 32B LLMs on a single H100 GPU.

Python 470 46 Updated Nov 27, 2025

NVlabs / LongLive

LongLive: Real-time Interactive Long Video Generation

Python 925 63 Updated Dec 4, 2025

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,112 338 Updated Dec 23, 2025

JimmyLv / awesome-nano-banana

Forked from jamez-bondos/awesome-gpt4o-images

Awesome curated collection of images and prompts generated by gemini-2.5-flash-image (aka Nano Banana) state-of-the-art image generation and editing model. Explore AI generated visuals created with…

JavaScript 8,189 836 Updated Sep 8, 2025

EvolvingLMMs-Lab / lmms-engine

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 680 27 Updated Dec 23, 2025

NVlabs / Long-RL

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

Python 678 25 Updated Sep 24, 2025

AIDC-AI / Ovis-U1

An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.

Python 445 14 Updated Dec 2, 2025

wdrink / SimpleAR

Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"

Python 422 22 Updated Jun 20, 2025

ligeng0197 / Awesome-Thinking-With-Images

Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grained visual understanding".

107 2 Updated Aug 21, 2025

zhaochen0110 / Awesome_Think_With_Images

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,223 40 Updated Dec 23, 2025

yangjie-cv / WeThink

WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning

Python 36 1 Updated Jun 10, 2025

ByteDance-Seed / Bagel

Open-source unified multimodal model

Python 5,500 481 Updated Oct 27, 2025

microsoft / OmniParser

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 24,082 2,063 Updated Sep 12, 2025

StarsfieldAI / R1-V

Witness the aha moment of VLM with less than $3.

Python 4,012 289 Updated May 19, 2025

om-ai-lab / VLM-R1

Solve Visual Understanding with Reinforced VLMs

Python 5,773 376 Updated Oct 21, 2025

Open-Reasoner-Zero / Open-Reasoner-Zero

Official Repo for Open-Reasoner-Zero

Python 2,083 119 Updated Jun 2, 2025

microsoft / Magma

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,880 152 Updated Oct 4, 2025

Timothyxxx / WorldModelPapers

Paper collections of the continuous effort start from World Models.

191 6 Updated Jul 6, 2024

causalfusion / causalfusion

Python 187 4 Updated Dec 17, 2024

IDEA-Research / ChatRex

Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Python 209 8 Updated Oct 15, 2025

IDEA-Research / Grounded-SAM-2

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

Jupyter Notebook 3,143 361 Updated Nov 11, 2025

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 18,100 2,292 Updated Dec 25, 2024