Stars
2026TAAC腾讯广告算法大赛-KDDCUP方案,best score:0.832321,rank:51
UniRank: A Ranking Model Benchmark for Unified Sequential Modeling and Feature Interaction
Accelerating MoE with IO and Tile-aware Optimizations
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
TokenSpeed is a speed-of-light LLM inference engine.
A self-hosted ML coding practice platform. 68 problems from ReLU to flow matching — attention, training, RLHF, diffusion, and more. Instant feedback in the browser.
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
AI agents running research on single-GPU nanochat training automatically
Pytorch domain library for recommendation systems
🎓从0开始训练一个大模型Minimind项目的超详细解析,包括但不限于用到的架构,算法,以及大模型面试经验
M2C-Tech / QGA
Forked from Tongyun1/QGAThe code repository for the KDD 2026 paper "Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies"
Implementation of "FlashPreill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling"
flash attention tutorial written in python, triton, cuda, cutlass
Offers a toolset for comprehensive, multi-faceted large-scale data analysis and optimizations
Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States
IntelliFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction.
Wan: Open and Advanced Large-Scale Video Generative Models
Research code accompanying AlphaGenome
【Accepted by WWW 2026 🎉🎉】Generative Regression Based Watch Time Prediction for Short-Video Recommendation