Skip to content
View whybeyoung's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Block or report whybeyoung

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 1,649 125 Updated Dec 18, 2025

rdma_demo

Python 1 Updated Apr 9, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 1 Updated Dec 20, 2025

Astron-xmod-shim — Lightweight, declarative middleware for reliably converging AI service workloads.

Go 97 15 Updated Nov 3, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 26 5 Updated Dec 20, 2025

Cross-platform AI workflow DSL converter supporting iFlytek Spark, Dify, and Coze platforms with unified intermediate representation and bidirectional transformation capabilities.

Go 18 3 Updated Nov 21, 2025
Go 70 2 Updated Sep 15, 2025

A workload for deploying LLM inference services on Kubernetes

Go 140 36 Updated Dec 20, 2025

OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go

Go 2 Updated Sep 22, 2025

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

Go 341 51 Updated Dec 20, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 7 1 Updated Dec 19, 2025

This a simple implementation of an MCP server using iFlytek. It enables calling iFlytek workflows through MCP tools.

Python 26 6 Updated Mar 28, 2025

easy version of pyverbs

Python 6 2 Updated Apr 16, 2023

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,872 430 Updated Mar 5, 2025

Expert Parallelism Load Balancer

Python 1,320 195 Updated Mar 24, 2025

Analyze computation-communication overlap in V3/R1.

1,128 144 Updated Mar 21, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,890 310 Updated Mar 10, 2025

My learning notes for ML SYS.

Python 4,710 298 Updated Dec 19, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,985 777 Updated Dec 8, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,818 1,033 Updated Dec 5, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,926 918 Updated Dec 15, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,948 288 Updated May 15, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 21,821 3,813 Updated Dec 20, 2025

🪄 Turns your machine learning code into microservices with web API, interactive GUI, and more.

Python 3,136 164 Updated Dec 5, 2025

A collection of community maintained NRI plugins

Go 100 31 Updated Dec 19, 2025

SciLifeLab Serve is a platform offering machine learning model serving, data science app hosting (Shiny, Gradio, Streamlit, Dash, etc.), and other tools to life science researchers affiliated with …

Python 11 1 Updated Dec 19, 2025

Examples of models deployable with Truss

Python 211 53 Updated Dec 17, 2025

BERT classification model for processing texts longer than 512 tokens. Text is first divided into smaller chunks and after feeding them to BERT, intermediate results are pooled. The implementation …

Python 147 32 Updated Jun 19, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,823 12,087 Updated Dec 20, 2025

Llama中文社区,实时汇总最新Llama学习资料,构建最好的中文Llama大模型开源生态,完全开源可商用

Python 14,753 1,304 Updated Apr 6, 2025
Next