-
The Chinese University of Hong Kong
- Hong Kong
-
19:26
(UTC +08:00) - https://harryhsing.github.io/
- in/xingzhenghao
- @onehsing
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
SGLang is a high-performance serving framework for large language models and multimodal models.
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
[ICLR 2026] UniVideo: Unified Understanding, Generation, and Editing for Videos
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
slime is an LLM post-training framework for RL Scaling.
A collection of awesome think with videos papers.
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
Scaling Long-Horizon LLM Agent via Context-Folding
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"
Code for "AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs"
Official repo for paper "EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning"
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A Survey of Reinforcement Learning for Large Reasoning Models
A community driven registry service for Model Context Protocol (MCP) servers.
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
[ICLR 2026] TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models
Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)
The most open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.
A version of verl to support diverse tool use