Stars
Ongoing research training transformer models at scale
Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"
🌿 DeepPrune: Parallel Scaling without Inter-trace Redundancy
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
[SIGGRAPH Asia 2025] CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling
🚀🚀 Efficient implementations of Native Sparse Attention
Renderer for the harmony response format to be used with gpt-oss
GLM-SIMPLE-EVALS: The evaluation repository for the GLM-4.5 series of models by Z.ai.
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
[COLM 2024] A Survey on Deep Learning for Theorem Proving
An efficient implementation of the NSA (Native Sparse Attention) kernel
TradingAgents: Multi-Agents LLM Financial Trading Framework
slime is an LLM post-training framework for RL Scaling.
[SIGGRAPH 2025] PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer
ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括335个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.5、文心ERNIE-X1.1、ERNIE-5.0-Thinking、qwen3-max、百川、讯飞星火、商汤senseChat等商用模型, 以及kimi-k2、ernie4.5、minimax-M2、deepseek-…
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
[AAAI 2026] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
[NeurIPS 2024] AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos