-
The Chinese University of Hong Kong
- Hong Kong
-
10:13
(UTC +08:00) - https://harryhsing.github.io/
- in/xingzhenghao
- @onehsing
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
[EMNLP'25 Oral] GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them?
slime is an LLM post-training framework for RL Scaling.
A collection of awesome think with videos papers.
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
Data Pipeline, Models, and Benchmark for Omni-Captioner.
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"
Code for "AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs"
Official repo for paper "EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning"
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A Survey of Reinforcement Learning for Large Reasoning Models
A community driven registry service for Model Context Protocol (MCP) servers.
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models
Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)
The most open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
A version of verl to support diverse tool use
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
[Survey] A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"