Skip to content
View shanshanpt's full-sized avatar
🗣️
Focusing
🗣️
Focusing

Organizations

@AlibabaPAI @DeepRec-AI

Block or report shanshanpt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
C++ 330 31 Updated Nov 13, 2025

Expert Parallelism Load Balancer

Python 1,318 195 Updated Mar 24, 2025

Pluggable in-process caching engine to build and scale high performance services

C++ 1,471 308 Updated Dec 17, 2025

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 2,037 340 Updated Dec 17, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,472 498 Updated Dec 13, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,644 744 Updated Dec 17, 2025

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,215 1,184 Updated Dec 17, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,887 310 Updated Mar 10, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,525 976 Updated Dec 13, 2025

Integrate the DeepSeek API into popular softwares

34,761 3,898 Updated Sep 25, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,964 776 Updated Dec 8, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,810 1,029 Updated Dec 5, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,916 918 Updated Dec 15, 2025

Community maintained hardware plugin for vLLM on Ascend

Python 1,471 664 Updated Dec 17, 2025

A sparse attention kernel supporting mix sparse patterns

C++ 407 38 Updated Dec 16, 2025

s1: Simple test-time scaling

Python 6,611 764 Updated Jun 25, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,543 2,835 Updated Dec 17, 2025

PyTorch native quantization and sparsity for training and inference

Python 2,576 386 Updated Dec 17, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,070 328 Updated Dec 17, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,641 2,232 Updated Feb 1, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 2,065 231 Updated Nov 23, 2025

16-fold memory access reduction with nearly no loss

Python 109 9 Updated Mar 26, 2025

Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.

Python 3,736 261 Updated Dec 15, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,497 302 Updated Nov 5, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,102 217 Updated May 19, 2025

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 13,030 1,381 Updated Dec 16, 2025

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,846 205 Updated Jan 16, 2025
Next