-
University of Chinese Academy of Sciences
- Beijing, China
-
23:21
(UTC +08:00) - https://nctimtang.github.io/tangxi.github.io/
Highlights
- Pro
Stars
About This repository is a curated collection of the most exciting and influential CVPR 2026 papers. 🔥 [Paper + Code + Demo]
🔥🔥🔥 [Awesome] Latest Papers, Codes & Datasets on Streaming / Online Video Understanding — Building Always-on, Real-time Video AI 🤖
A simple video streaming baseline that outperforms SOTAs.
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
[CVPR 2026] LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
vHeat: Building Vision Models upon Heat Conduction
[ICML 2026 Spotlight] Code for miXed Discrete Diffusion Language Model
[ICLR 2026] VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
Official repo of From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A comprehensive and up-to-date compilation of datasets, tools, methods, review papers, and competitions for remote sensing change detection.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
REverse-Engineered Reasoning for Open-Ended Generation
Fast and memory-efficient exact attention
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Official Repo for Open-Reasoner-Zero
Solve Visual Understanding with Reinforced VLMs
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
The official implementation of the **ReDDiT: Rehashing Noise for Discrete Visual Generation** paper.
Implementation of paper "CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis"