Starred repositories
⛅️ 精选的 Cloudflare 工具、开源项目、指南、博客和其他资源列表。/ ⛅️ A curated list of Cloudflare tools, open source projects, guides, blogs and other resources.
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)
Kimi K2 is the large language model series developed by Moonshot AI team
Minimal yet performant LLM examples in pure JAX
OmniGen2: Exploration to Advanced Multimodal Generation.
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)
UdioWrapper is a Python package that enables the generation of music tracks using Udio's API through textual prompts. This package is based on the reverse engineering of the Udio API (https://www.…
SGLang is a fast serving framework for large language models and vision language models.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
open-source coding LLM for software engineering tasks
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Open standard for machine learning interoperability
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models