Skip to content
View inkcherry's full-sized avatar
🍉
🍉

Block or report inkcherry

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MoonPalaceïŒˆæœˆćź«ïŒ‰æ˜Żç”± Moonshot AI 月ä苿š—éąæäŸ›çš„ API è°ƒèŻ•ć·„ć…·ă€‚

Go 229 7 Updated Dec 30, 2024

NVIDIA Inference Xfer Library (NIXL)

C++ 887 244 Updated Feb 18, 2026

Unified Collective Communication Library

C 293 128 Updated Feb 12, 2026

[DEPRECATED] Moved to ROCm/rocm-systems repo

C++ 144 43 Updated Feb 16, 2026
C++ 2 Updated Oct 30, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,780 561 Updated Feb 16, 2026

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 3,550 295 Updated Feb 15, 2026

Ongoing research training transformer models at scale

Python 36 34 Updated Feb 17, 2026

Modular RDMA Interface

C++ 78 21 Updated Feb 18, 2026

verl: Volcano Engine Reinforcement Learning for LLMs

Python 19,254 3,257 Updated Feb 18, 2026

Fine-tuning & Reinforcement Learning for LLMs. đŸŠ„ Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.

Python 52,322 4,327 Updated Feb 17, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,679 961 Updated Feb 13, 2026

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 2,184 254 Updated Jan 27, 2026

Nano vLLM

Python 11,726 1,585 Updated Nov 3, 2025

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,836 219 Updated Feb 18, 2026

Python pdb for multiple processes

Python 80 9 Updated May 24, 2025

A family of lightweight multimodal models.

Python 1,051 77 Updated Nov 18, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 85 133 Updated Feb 15, 2026

Run compilers interactively from your web browser and interact with the assembly

TypeScript 18,558 1,990 Updated Feb 17, 2026

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,307 161 Updated Jan 4, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,897 300 Updated Feb 17, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,250 93 Updated Aug 28, 2025

Perplexity GPU Kernels

C++ 562 75 Updated Nov 7, 2025

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,521 1,213 Updated Feb 17, 2026

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 3,184 485 Updated Feb 16, 2026

Open-source Linux performance suite for engineers—profiling and tuning workloads and system configurations.

Go 432 53 Updated Feb 18, 2026

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,709 1,002 Updated Feb 4, 2026

The calflops is designed to calculate FLOPs、MACs and Parameters in all various neural networks, such as Linear、 CNN、 RNN、 GCN、Transformer(Bert、LlaMA etc Large Language Model)

Python 925 40 Updated Jun 27, 2024

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,919 312 Updated Jan 14, 2026
Next