yukavio

KavioYu yukavio

Work for Tencent-WXG. Focus on model inference optimization, such as inference engine and model compression.

20 followers · 2 following

Shanghai

Achievements

sglang Public
Forked from sgl-project/sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Python 2 1 Apache License 2.0 Updated Nov 3, 2025
nsa Public

native sparse attention kernel

Python 7 2 MIT License Updated Sep 8, 2025
ThunderKittens Public
Forked from HazyResearch/ThunderKittens

Tile primitives for speedy kernels

Cuda MIT License Updated Jul 23, 2025
flash-linear-attention Public
Forked from fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton

Python MIT License Updated Jun 5, 2025
verl Public
Forked from volcengine/verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python Apache License 2.0 Updated Apr 28, 2025
flash-attention Public
Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python BSD 3-Clause "New" or "Revised" License Updated Apr 17, 2025
DeepGEMM Public
Forked from deepseek-ai/DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 1 MIT License Updated Mar 21, 2025
dynamo Public
Forked from ai-dynamo/dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust Apache License 2.0 Updated Mar 19, 2025
applied-ai Public
Forked from meta-pytorch/applied-ai

Applied AI experiments and examples for PyTorch

Python BSD 3-Clause "New" or "Revised" License Updated Dec 31, 2024
triton Public
Forked from triton-lang/triton

Development repository for the Triton language and compiler

C++ MIT License Updated Dec 30, 2024
flashinfer Public
Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda Apache License 2.0 Updated Dec 11, 2024
EAGLE Public
Forked from SafeAILab/EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

Python 1 Apache License 2.0 Updated Nov 16, 2024
vllm Public
Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python Apache License 2.0 Updated Aug 27, 2024
cuda_hgemm Public
Forked from Bruce-Lee-LY/cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 1 MIT License Updated May 21, 2024
Paddle Public
Forked from PaddlePaddle/Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

Python Apache License 2.0 Updated Mar 9, 2021
PaddleSlim Public
Forked from PaddlePaddle/PaddleSlim

PaddleSlim is an open-source library for deep model compression and architecture search.

Python Apache License 2.0 Updated Feb 26, 2021
Paddle-Lite Public
Forked from PaddlePaddle/Paddle-Lite

Multi-platform high performance deep learning inference engine (『飞桨』多平台高性能深度学习预测引擎）

C++ 2 Apache License 2.0 Updated Feb 25, 2021
PaddleClas Public
Forked from PaddlePaddle/PaddleClas

A treasure chest for image classification powered by PaddlePaddle

Python Apache License 2.0 Updated Feb 23, 2021
PaddleSeg Public
Forked from PaddlePaddle/PaddleSeg

End-to-end image segmentation kit based on PaddlePaddle.

Python Apache License 2.0 Updated Jan 22, 2021
FluidDoc Public
Forked from PaddlePaddle/docs

Documentations for PaddlePaddle

Shell Updated Dec 3, 2020
nni Public
Forked from microsoft/nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Python MIT License Updated Nov 11, 2020
CINN Public
Forked from PaddlePaddle/CINN

a Compiler Infrastructure for Neural Networks

C++ Apache License 2.0 Updated Nov 9, 2020
PaddleOCR Public
Forked from PaddlePaddle/PaddleOCR

OCR toolkit based on PaddlePaddle （基于飞桨的OCR工具库，包含总模型仅8.6M的超轻量级中文OCR，同时支持多种文本检测、文本识别的训练算法、服务部署和端侧部署）

C++ Apache License 2.0 Updated Nov 4, 2020
incubator-tvm Public
Forked from apache/tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python Apache License 2.0 Updated Nov 1, 2020
test Public

test for git

Updated Jul 2, 2020

KavioYu yukavio

Achievements

Achievements

sglang Public

Uh oh!

nsa Public

Uh oh!

ThunderKittens Public

Uh oh!

flash-linear-attention Public

Uh oh!

verl Public

Uh oh!

flash-attention Public

Uh oh!

DeepGEMM Public

Uh oh!

dynamo Public

Uh oh!

applied-ai Public

Uh oh!

triton Public

Uh oh!

flashinfer Public

Uh oh!

EAGLE Public

Uh oh!

vllm Public

Uh oh!

cuda_hgemm Public

Uh oh!

Paddle Public

Uh oh!

PaddleSlim Public

Uh oh!

Paddle-Lite Public

Uh oh!

PaddleClas Public

Uh oh!

PaddleSeg Public

Uh oh!

FluidDoc Public

Uh oh!

nni Public

Uh oh!

CINN Public

Uh oh!

PaddleOCR Public

Uh oh!

incubator-tvm Public

Uh oh!

test Public

Uh oh!