mingyangHao

Follow

Mingyang mingyangHao

Follow

0 followers · 3 following

Shanghai
20:14 (UTC -12:00)

Popular repositories Loading

flash-attention flash-attention Public

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python
FlashMLA FlashMLA Public

Forked from deepseek-ai/FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++
TensorRT-LLM TensorRT-LLM Public

Forked from NVIDIA/TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++