Skip to content
View HarryHsing's full-sized avatar
🎾
TTWSYF
🎾
TTWSYF

Highlights

  • Pro

Block or report HarryHsing

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 2,264 162 Updated Dec 19, 2025

[EMNLP'25 Oral] GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them?

Python 8 Updated Aug 22, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,926 353 Updated Dec 22, 2025

A collection of awesome think with videos papers.

74 2 Updated Dec 1, 2025
Python 442 46 Updated Nov 25, 2025

Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.

Python 193 7 Updated Oct 12, 2025
Python 79 6 Updated Oct 28, 2025

Data Pipeline, Models, and Benchmark for Omni-Captioner.

Python 105 Updated Oct 17, 2025

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

Python 12,733 1,175 Updated Sep 26, 2025

Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"

Python 35 3 Updated Oct 11, 2024

Code for "AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs"

Python 20 Updated Oct 9, 2025

Official repo for paper "EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning"

Python 117 3 Updated Oct 9, 2025

[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

Python 1,107 65 Updated Nov 25, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,144 193 Updated Oct 9, 2025

Democratizing Reinforcement Learning for LLMs

Python 4,883 467 Updated Dec 21, 2025

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,676 1,354 Updated Dec 17, 2025

A Survey of Reinforcement Learning for Large Reasoning Models

TeX 2,183 120 Updated Nov 9, 2025

A community driven registry service for Model Context Protocol (MCP) servers.

Go 6,158 537 Updated Dec 18, 2025

Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"

Python 377 15 Updated Sep 15, 2025

TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Python 367 30 Updated Dec 16, 2025

Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)

Python 195 15 Updated Jun 23, 2025

The most open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.

Python 488 34 Updated Nov 11, 2025

Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Jupyter Notebook 403 35 Updated Dec 15, 2024

A version of verl to support diverse tool use

Python 769 63 Updated Dec 10, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,290 327 Updated Dec 15, 2025

[Survey] A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

1,433 92 Updated Oct 11, 2025

📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.

369 19 Updated Nov 29, 2025

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

Python 2,396 224 Updated Dec 19, 2025

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python 1,297 116 Updated Dec 11, 2025
Next