Skip to content
View xuyang-liu16's full-sized avatar
🧑‍💻
Focusing
🧑‍💻
Focusing

Block or report xuyang-liu16

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
C++ 210 6 Updated Nov 19, 2025
Python 1 Updated Nov 11, 2025

IELTS Materials

49 9 Updated Jul 28, 2019

Paper list of streaming video understanding

2 Updated Dec 22, 2025

🔥 OneThinker: All-in-one Reasoning Model for Image and Video

Python 327 25 Updated Dec 9, 2025

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 607 51 Updated Oct 29, 2025

Code and data for VTCBench, a vision-text compression benchmark for Vision Language Models.

Python 11 1 Updated Dec 22, 2025

Fast, memory-efficient attention column reduction (e.g., sum, mean, max)

Python 29 Updated Dec 15, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,786 1,076 Updated Dec 23, 2025

Github README 常用徽章和图表合集

256 32 Updated Apr 19, 2025

收集各种有意思的徽章

JavaScript 103 9 Updated Oct 22, 2020

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,154 193 Updated Oct 9, 2025
Python 12 2 Updated Nov 11, 2025
Python 11 Updated Aug 28, 2025

Official Implementation for the paper: "ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos"

1 Updated Dec 4, 2025

[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Python 93 5 Updated Apr 14, 2025

1+1 > 2 : Detector-Empowered Video Large Language Model for Spatio-Temporal Grounding and Reasoning

9 Updated Dec 13, 2025

The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"

Python 541 34 Updated Dec 11, 2025

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

Python 27 1 Updated Dec 3, 2025

Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"

Python 72 1 Updated Dec 5, 2025
Python 6 Updated Nov 28, 2025

[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++

Python 209 4 Updated Jul 28, 2025

Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Python 21 1 Updated Dec 7, 2025

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

Python 12,751 1,177 Updated Sep 26, 2025
53 Updated Nov 14, 2025

[arxiv'25] TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding

Python 19 Updated Dec 11, 2025

Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models

Python 22 Updated Nov 24, 2025

MR. Video: MapReduce is the Principle for Long Video Understanding

28 Updated Apr 23, 2025

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 6,356 741 Updated Dec 21, 2025

[AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

Python 21 3 Updated Apr 17, 2025
Next