shenjy0829

🤒

Out sick

Sliverfish shenjy0829

🤒

Out sick

5 followers · 217 following

Achievements

Highlights

hpc-ops Public
Forked from Tencent/hpc-ops

High Performance LLM Inference Operator Library

C++ Other Updated Jan 27, 2026
minions Public
Forked from HazyResearch/minions

Big & Small LLMs working together

Python MIT License Updated Jan 27, 2026
FastVideo Public
Forked from hao-ai-lab/FastVideo

A unified inference and post-training framework for accelerated video generation.

Python Apache License 2.0 Updated Jan 16, 2026
Engram Public
Forked from deepseek-ai/Engram

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Python Apache License 2.0 Updated Jan 12, 2026
ThunderKittens Public
Forked from HazyResearch/ThunderKittens

Tile primitives for speedy kernels

Cuda MIT License Updated Jan 12, 2026
dl-projects Public

just for self-learning

Updated Jan 11, 2026
cutlass Public
Forked from NVIDIA/cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ Other Updated Jan 9, 2026
kernel-bench Public

For self-learning purposes ~

Python Updated Jan 8, 2026
CUDA-L2 Public
Forked from deepreinforce-ai/CUDA-L2

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

Cuda MIT License Updated Jan 8, 2026
verl Public
Forked from verl-project/verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python Apache License 2.0 Updated Jan 6, 2026
DeepGEMM Public
Forked from deepseek-ai/DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda MIT License Updated Jan 6, 2026
dflash Public
Forked from z-lab/dflash

Block Diffusion for Ultra-Fast Speculative Decoding

Python MIT License Updated Jan 5, 2026
ArcticInference Public
Forked from snowflakedb/ArcticInference

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python Apache License 2.0 Updated Dec 30, 2025
DeepEP Public
Forked from deepseek-ai/DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda MIT License Updated Dec 29, 2025
Megatron-LM Public
Forked from NVIDIA/Megatron-LM

Ongoing research training transformer models at scale

Python Other Updated Dec 28, 2025
mini-sglang Public
Forked from sgl-project/mini-sglang

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python Updated Dec 26, 2025
DeepSpeed Public
Forked from deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python Apache License 2.0 Updated Dec 24, 2025
sionna-rk Public
Forked from NVlabs/sionna-rk

Sionna Research Kit: A GPU-Accelerated Research Platform for AI-RAN

Jupyter Notebook Other Updated Dec 19, 2025
FlashMLA Public
Forked from deepseek-ai/FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ MIT License Updated Dec 15, 2025
nanoGPT Public
Forked from karpathy/nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python MIT License Updated Nov 12, 2025
nano-vllm Public
Forked from GeeeekExplorer/nano-vllm

Nano vLLM

Python MIT License Updated Nov 3, 2025
Advanced-Progress-Bars Public
Forked from cactuzhead/Advanced-Progress-Bars

Obsidian plugin to create custom progress bars

TypeScript MIT License Updated Oct 3, 2025
Interesting-Code Public

something interesting ~

Updated Jul 9, 2025
DistServe Public
Forked from LLMServe/DistServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook Apache License 2.0 Updated Apr 6, 2025
Hybrid-SD Public
Forked from bytedance/Hybrid-SD

Python Apache License 2.0 Updated Jan 2, 2025
marlin Public
Forked from IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python Apache License 2.0 Updated Sep 4, 2024
blog_template Public

Dockerfile MIT License Updated May 21, 2024

Sliverfish shenjy0829

Achievements

Achievements

Highlights

hpc-ops Public

Uh oh!

minions Public

Uh oh!

FastVideo Public

Uh oh!

Engram Public

Uh oh!

ThunderKittens Public

Uh oh!

dl-projects Public

Uh oh!

cutlass Public

Uh oh!

kernel-bench Public

Uh oh!

CUDA-L2 Public

Uh oh!

verl Public

Uh oh!

DeepGEMM Public

Uh oh!

dflash Public

Uh oh!

ArcticInference Public

Uh oh!

DeepEP Public

Uh oh!

Megatron-LM Public

Uh oh!

mini-sglang Public

Uh oh!

DeepSpeed Public

Uh oh!

sionna-rk Public

Uh oh!

FlashMLA Public

Uh oh!

nanoGPT Public

Uh oh!

nano-vllm Public

Uh oh!

Advanced-Progress-Bars Public

Uh oh!

Interesting-Code Public

Uh oh!

DistServe Public

Uh oh!

Hybrid-SD Public

Uh oh!

marlin Public

Uh oh!

blog_template Public

Uh oh!