Skip to content
View shanshanpt's full-sized avatar
🗣️
Focusing
🗣️
Focusing

Organizations

@AlibabaPAI @DeepRec-AI

Block or report shanshanpt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
C++ 330 31 Updated Nov 13, 2025

Expert Parallelism Load Balancer

Python 1,318 195 Updated Mar 24, 2025

Pluggable in-process caching engine to build and scale high performance services

C++ 1,471 308 Updated Dec 17, 2025

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 2,038 342 Updated Dec 17, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,474 500 Updated Dec 13, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,648 745 Updated Dec 18, 2025

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,221 1,184 Updated Dec 18, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,887 310 Updated Mar 10, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,528 977 Updated Dec 13, 2025

Integrate the DeepSeek API into popular softwares

34,775 3,899 Updated Sep 25, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,971 777 Updated Dec 8, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,813 1,032 Updated Dec 5, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,920 918 Updated Dec 15, 2025

Community maintained hardware plugin for vLLM on Ascend

Python 1,480 667 Updated Dec 18, 2025

A sparse attention kernel supporting mix sparse patterns

C++ 407 38 Updated Dec 16, 2025

s1: Simple test-time scaling

Python 6,615 764 Updated Jun 25, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,585 2,847 Updated Dec 18, 2025

PyTorch native quantization and sparsity for training and inference

Python 2,579 386 Updated Dec 18, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,076 331 Updated Dec 18, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,642 2,233 Updated Feb 1, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 2,065 232 Updated Dec 18, 2025

16-fold memory access reduction with nearly no loss

Python 109 9 Updated Mar 26, 2025

Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.

Python 3,737 261 Updated Dec 15, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,498 302 Updated Nov 5, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,103 217 Updated May 19, 2025

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 13,034 1,383 Updated Dec 17, 2025

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,847 205 Updated Jan 16, 2025
Next