Stars
A parallel programming training mini app simulating weather-like flows
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
[TMLR 2024] Efficient Large Language Models: A Survey
A high-throughput and memory-efficient inference and serving engine for LLMs
List of papers related to neural network quantization in recent AI conferences and journals.
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
zTT: Learning-based DVFS with Zero Thermal Throttling for Mobile Devices [MobiSys'21] - Artifact Evaluation
a Model-Free GPU Online Energy Optimization (MF-GPOEO) framework
XiTAO is a lightweight layer built on top of modern C++ features with the goals of being low-overhead and serving as a development platform for testing scheduling and resource management algorithms.
A quick survival guild for i18n students who comes to chalmers.
😏国内外计算机的优秀课程,包含MIT、CMU等世界CS名校,🔥🔥其中包含计算机基础学科(操作系统、计算机网络、编译器、数据库、数据结构与算法等)以及人工智能&AI等高级科目,欢迎通过PR形式贡献!
My curriculum vitae (CV) written using LaTeX.
Project level config for insanely fast feedback loops
程序员延寿指南 | A programmer's guide to live longer
欧港新CS留学项目指北