- Shanghai,China
Lists (3)
Sort Name ascending (A-Z)
Stars
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
Fully Open Framework for Democratized Multimodal Reinforcement Learning.
A minimal, educational HEVC (H.265) encoder written in Python.
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Official PyTorch implementation of ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
Official PyTorch implementation for "Large Language Diffusion Models"
[AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
AIMedia 是一款自动抓取热点,AI创作文章,自动发布的集成软件。支持头条,小红书,公众号等
State-of-the-art 2D and 3D Face Analysis Project
Fully Open Framework for Democratized Multimodal Training
Multilingual Document Layout Parsing in a Single Vision-Language Model
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.
verl: Volcano Engine Reinforcement Learning for LLMs
Official repository for HOComp: Interaction-Aware Human-Object Composition
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
Ongoing research training transformer models at scale
A curated list of balanced multimodal learning methods.
Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"
[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
The simplest, fastest repository for training/finetuning small-sized VLMs.
A paper list of some recent works about Token Compress for Vit and VLM