Stars
A PyTorch native platform for training generative AI models
This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without Supervision". SpidR is a self-supervised speech representat…
A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.
MiroThinker is a deep research agent optimized for complex research and prediction tasks. Our latest models, MiroThinker-1.7 and MiroThinker-H1, achieve 74.0 and 88.2 on the BrowseComp, respectively.
Official JAX implementation of End-to-End Test-Time Training for Long Context
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
🚀 A curated list of awesome resources focusing on Context Compression techniques for Large Language Models(LLMs).
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
Native Multimodal Models are World Learners
A language-model–powered compressor for natural language text
Training Large Language Model to Reason in a Continuous Latent Space
A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
verl: Volcano Engine Reinforcement Learning for LLMs
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Fully open reproduction of DeepSeek-R1
Train transformer language models with reinforcement learning.
Solve Visual Understanding with Reinforced VLMs
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
COYO-700M: Large-scale Image-Text Pair Dataset
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.