Stars
Fully open reproduction of DeepSeek-R1
verl: Volcano Engine Reinforcement Learning for LLMs
slime is an LLM post-training framework for RL Scaling.
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
MoBA: Mixture of Block Attention for Long-Context LLMs
Large World Model -- Modeling Text and Video with Millions Context
Fully Open Framework for Democratized Multimodal Training
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Triton-based implementation of Sparse Mixture of Experts.
Code release for paper "Test-Time Training Done Right"
Ring attention implementation with flash attention
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Ongoing research training transformer models at scale
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Hackable and optimized Transformers building blocks, supporting a composable construction.
Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Automatically split your PyTorch models on multiple GPUs for training & inference
A PyTorch native platform for training generative AI models
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Fast and memory-efficient exact attention
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)