Skip to content
View yiwenshao's full-sized avatar

Block or report yiwenshao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 4,667 440 Updated Nov 4, 2025

A blazingly fast JSON serializing & deserializing library

Go 8,805 418 Updated Oct 28, 2025

A Lite Version of Kubernetes in Rust

Rust 168 36 Updated Nov 5, 2025

The best ChatGPT that $100 can buy.

Python 35,698 4,096 Updated Nov 5, 2025

bpftop provides a dynamic real-time view of running eBPF programs. It displays the average runtime, events per second, and estimated total CPU % for each program.

C 2,564 123 Updated Oct 28, 2025

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

Go 304 43 Updated Nov 3, 2025

Large Language Model (LLM) Systems Paper List

1,580 86 Updated Nov 4, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 606 51 Updated Nov 4, 2025

The official implementation of OSDI'25 paper BlitzScale

Rust 37 2 Updated Sep 20, 2025

My learning notes/codes for ML SYS.

Python 4,062 247 Updated Oct 6, 2025

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 823 143 Updated Sep 26, 2025

FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.

Python 302 19 Updated Nov 2, 2025

An extension of the nanoGPT repository for training small MOE models.

Python 207 26 Updated Mar 9, 2025

Convert PDF to markdown + JSON quickly with high accuracy

Python 29,626 1,991 Updated Nov 3, 2025

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 442 47 Updated May 14, 2025

5ire is a cross-platform desktop AI assistant, MCP client. It compatible with major service providers, supports local knowledge base and tools via model context protocol servers .

TypeScript 4,743 358 Updated Nov 4, 2025

12306 MCP Server​​ 是一个基于 ​​Model Context Protocol (MCP)​​ 的高性能火车票查询后端系统。它通过标准化接口提供官方 12306 的实时数据服务,包括余票查询、车站信息、列车经停站、中转换乘方案等核心功能。

Python 253 30 Updated Sep 26, 2025

Kimi K2 is the large language model series developed by Moonshot AI team

8,454 557 Updated Oct 31, 2025

Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM

63 12 Updated Aug 12, 2025

Everything about the SmolLM and SmolVLM family of models

Python 3,369 231 Updated Sep 16, 2025

Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

JavaScript 1,380 85 Updated Dec 3, 2024

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 718 80 Updated Apr 6, 2025

DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.

Python 17,885 2,226 Updated Oct 31, 2025

Production-grade client-side tracing, profiling, and analysis for complex software systems.

C++ 4,894 610 Updated Nov 5, 2025

Nano vLLM

Python 8,294 1,015 Updated Nov 3, 2025

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 571 70 Updated Sep 11, 2024

Windows Precision Touchpad Driver Implementation for Apple MacBook / Magic Trackpad

C 9,934 619 Updated Jan 7, 2024

😼 优雅地使用基于 clash/mihomo 的代理环境

Shell 5,510 700 Updated Nov 5, 2025

所有小初高、大学PDF教材。

Roff 54,425 12,247 Updated Oct 18, 2025

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.

C++ 1,773 73 Updated Jun 16, 2025
Next