-
University of Western Australia (UWA)
- https://zinuoli.github.io/
Highlights
- Pro
Lists (5)
Sort Name ascending (A-Z)
Stars
[NeurIPS 2025] Official repository of the paper "Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation"
Fast and Universal 3D reconstruction model for versatile tasks
Reinforcement Learning via Self-Distillation (SDPO)
A tool for generating synthetic function call datasets for Large Language Models (LLMs).
A simple yet powerful agent framework that delivers with open-source models
[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
[NeurIPS 2025] Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM
🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training
Pytorch implementation of "SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery"
📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"
Repo for paper "MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding".
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
CS336 作业 5 实现, 附加作业里面的 dpo/rlhf 也完成了, 消融实验分析也放在飞书文档里面了, 仅供参考
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Structured Video Comprehension of Real-World Shorts
A version of verl to support diverse tool use
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.