Stars
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
SWE-bench: Can Language Models Resolve Real-world Github Issues?
ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802
EleutherAI / nanoGPT-mup
Forked from karpathy/nanoGPTThe simplest, fastest repository for training/finetuning medium-sized GPTs.
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
Efficient Mixture of Experts for LLM Paper List
Awesome LLM Books: Curated list of books on Large Language Models
High-speed Large Language Model Serving for Local Deployment
GPU operators for sparse tensor operations
Benchmarking Benchmark Leakage in Large Language Models
A resource repository for machine unlearning in large language models
2025年11月更新,目前国内可用Docker镜像源汇总,DockerHub国内镜像加速列表,🚀DockerHub镜像加速器
FlashMLA: Efficient Multi-head Latent Attention Kernels
Minimal reproduction of DeepSeek R1-Zero
Fully open data curation for reasoning models
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
🚀 Efficient implementations of state-of-the-art linear attention models
A Telegram bot to recommend arXiv papers
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
RLHF implementation details of OAI's 2019 codebase
A flexible and efficient training framework for large-scale alignment tasks