-
Ph.D. Candidate@CUHK-MMLab, B.E.@ UCAS
- HongKong
- https://jf-d.github.io/
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
An Open Source Machine Learning Framework for Everyone
Productive, portable, and performant GPU programming in Python.
Seamless operability between C++11 and Python
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
FlashMLA: Efficient Multi-head Latent Attention Kernels
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
High-speed Large Language Model Serving for Local Deployment
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Optimized primitives for collective multi-GPU communication
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Tutorial code on how to build your own Deep Learning System in 2k Lines
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
TinyChatEngine: On-Device LLM Inference Library
A "large" language model running on a microcontroller
A high-performance inference system for large language models, designed for production environments.
MSCCL++: A GPU-driven communication stack for scalable AI applications
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration