whybeyoung

💭

I may be slow to respond.

ybyang whybeyoung

💭

I may be slow to respond.

66 followers · 76 following

NIVIC
HeFei

Achievements

x3 x2 x2

Achievements

x3 x2 x2

Lists (3)

Sort

Ds

🔮 Future ideas

1 repository

讯飞

1 repository

Stars

sgl-project / mini-sglang

Python 1,649 125 Updated Dec 18, 2025

iflytek / rdma_demo

rdma_demo

Python 1 Updated Apr 9, 2025

whybeyoung / sglang

Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 1 Updated Dec 20, 2025

iflytek / astron-xmod-shim

Astron-xmod-shim — Lightweight, declarative middleware for reliably converging AI service workloads.

Go 97 15 Updated Nov 3, 2025

antgroup / sglang

Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 26 5 Updated Dec 20, 2025

iflytek / agentbridge

Cross-platform AI workflow DSL converter supporting iFlytek Spark, Dify, and Coze platforms with unified intermediate representation and bidirectional transformation capabilities.

Go 18 3 Updated Nov 21, 2025

kvcache-ai / TrEnv-X

Go 70 2 Updated Sep 15, 2025

sgl-project / rbg

A workload for deploying LLM inference services on Kubernetes

Go 140 36 Updated Dec 20, 2025

whybeyoung / go-openai

Forked from sashabaranov/go-openai

OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go

Go 2 Updated Sep 22, 2025

sgl-project / ome

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

Go 341 51 Updated Dec 20, 2025

fzyzcjy / sglang

Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 7 1 Updated Dec 19, 2025

iflytek / ifly-workflow-mcp-server

This a simple implementation of an MCP server using iFlytek. It enables calling iFlytek workflows through MCP tools.

Python 26 6 Updated Mar 28, 2025

Marovlo / easyPyverbs

easy version of pyverbs

Python 6 2 Updated Apr 16, 2023

deepseek-ai / smallpond

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,872 430 Updated Mar 5, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,320 195 Updated Mar 24, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,128 144 Updated Mar 21, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,890 310 Updated Mar 10, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes for ML SYS.

Python 4,710 298 Updated Dec 19, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,985 777 Updated Dec 8, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,818 1,033 Updated Dec 5, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,926 918 Updated Dec 15, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,948 288 Updated May 15, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 21,821 3,813 Updated Dec 20, 2025

ml-tooling / opyrator

🪄 Turns your machine learning code into microservices with web API, interactive GUI, and more.

Python 3,136 164 Updated Dec 5, 2025

containers / nri-plugins

A collection of community maintained NRI plugins

Go 100 31 Updated Dec 19, 2025

ScilifelabDataCentre / serve

SciLifeLab Serve is a platform offering machine learning model serving, data science app hosting (Shiny, Gradio, Streamlit, Dash, etc.), and other tools to life science researchers affiliated with …

Python 11 1 Updated Dec 19, 2025

basetenlabs / truss-examples

Examples of models deployable with Truss

Python 211 53 Updated Dec 17, 2025

mim-solutions / bert_for_longer_texts

BERT classification model for processing texts longer than 512 tokens. Text is first divided into smaller chunks and after feeding them to BERT, intermediate results are pooled. The implementation …

Python 147 32 Updated Jun 19, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,823 12,087 Updated Dec 20, 2025

LlamaFamily / Llama-Chinese

Llama中文社区，实时汇总最新Llama学习资料，构建最好的中文Llama大模型开源生态，完全开源可商用

Python 14,753 1,304 Updated Apr 6, 2025