shanshanpt

Follow

🗣️

Focusing

Tao Peng shanshanpt

🗣️

Focusing

Follow

fighting

71 followers · 40 following

@AlibabaPAI @DeepRec-AI
Beijing, China.
21:58 (UTC +08:00)
https://orcid.org/0009-0008-4450-4768

Achievements

Achievements

Organizations

Lists (4)

Sort

🔮 Future ideas

✨ Inspiration

🚀 My stack

🐧MyRepo

Starred repositories

stepfun-ai / StepMesh

C++ 330 31 Updated Nov 13, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,318 195 Updated Mar 24, 2025

facebook / CacheLib

Pluggable in-process caching engine to build and scale high performance services

C++ 1,471 308 Updated Dec 17, 2025

vllm-project / production-stack

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 2,038 342 Updated Dec 17, 2025

vllm-project / aibrix

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,474 500 Updated Dec 13, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,648 745 Updated Dec 18, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,221 1,184 Updated Dec 18, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,887 310 Updated Mar 10, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,528 977 Updated Dec 13, 2025

deepseek-ai / awesome-deepseek-integration

Integrate the DeepSeek API into popular softwares

34,775 3,899 Updated Sep 25, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,971 777 Updated Dec 8, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,813 1,032 Updated Dec 5, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,920 918 Updated Dec 15, 2025

vllm-project / vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Python 1,480 667 Updated Dec 18, 2025

mit-han-lab / Block-Sparse-Attention

A sparse attention kernel supporting mix sparse patterns

C++ 407 38 Updated Dec 16, 2025

simplescaling / s1

s1: Simple test-time scaling

Python 6,615 764 Updated Jun 25, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,585 2,847 Updated Dec 18, 2025

pytorch / ao

PyTorch native quantization and sparsity for training and inference

Python 2,579 386 Updated Dec 18, 2025

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,076 331 Updated Dec 18, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,642 2,233 Updated Feb 1, 2025

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 2,065 232 Updated Dec 18, 2025

andy-yang-1 / DoubleSparse

16-fold memory access reduction with nearly no loss

Python 109 9 Updated Mar 26, 2025

MoonshotAI / Kimi-k1.5

3,466 233 Updated Mar 7, 2025

deepseek-ai / DeepSeek-V3

Python 100,783 16,423 Updated Aug 28, 2025

deepseek-ai / DeepSeek-R1

91,596 11,778 Updated Jun 27, 2025

Lightning-AI / LitServe

Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.

Python 3,737 261 Updated Dec 15, 2025

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,498 302 Updated Nov 5, 2024

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,103 217 Updated May 19, 2025

Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 13,034 1,383 Updated Dec 17, 2025

gpt-omni / mini-omni2

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,847 205 Updated Jan 16, 2025

Starred topics

Tensorflow

Deep learning

Compiler