Highlights
Stars
Accelerating MoE with IO and Tile-aware Optimizations
slime is an LLM post-training framework for RL Scaling.
Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Official implement of paper "Revisiting Multimodal Positional Encoding in Vision–Language Models"
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
HuggingFace conversion and training library for Megatron-based models
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".
verl: Volcano Engine Reinforcement Learning for LLMs
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
"Syntriever: How to Train Your Retriever with Synthetic Data from LLMs" the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), Findings, Accepted
Muon is an optimizer for hidden layers in neural networks
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
Open-source and strong foundation image recognition models.
Scalable data pre processing and curation toolkit for LLMs
Implementation of "Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance" (WACV 2025).
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
A lightweight LMM-based Document Parsing Model