- Shanghai, China
- airhaohan.github.io
Starred repositories
A high-throughput and memory-efficient inference and serving engine for LLMs
A curated list for Efficient Large Language Models
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient Multi-head Latent Attention Kernels
Awesome LLMs on Device: A Comprehensive Survey
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
高性能计算相关知识学习笔记,包含学习笔记和相关知识的代码demo,在持续完善中。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!
Pretrained Language Models for Source code
4 labs + 2 challenges + 4 docs
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
A list of awesome research on log analysis, anomaly detection, fault localization, and AIOps
中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)
Paper Lists for Graph Neural Networks