Skip to content
View chiakicage's full-sized avatar
🦀
rusting
🦀
rusting
  • Zhejiang University

Highlights

  • Pro

Block or report chiakicage

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
50 stars written in C++
Clear filter

LLM inference in C/C++

C++ 94,476 14,779 Updated Feb 6, 2026

mold: A Modern Linker 🦠

C++ 16,140 528 Updated Dec 12, 2025

Nix, the purely functional package manager

C++ 16,050 1,834 Updated Feb 5, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,449 981 Updated Jan 20, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,231 1,661 Updated Feb 4, 2026

High-speed Large Language Model Serving for Local Deployment

C++ 8,635 482 Updated Jan 24, 2026

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

C++ 5,617 658 Updated Feb 4, 2026

Adds AMD FSR 3 Frame Generation to games by replacing Nvidia DLSS Frame Generation (nvngx_dlssg).

C++ 4,921 191 Updated Mar 16, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,691 548 Updated Feb 6, 2026

A fast single-producer, single-consumer lock-free queue for C++

C++ 4,477 728 Updated Jun 25, 2025

Lightning fast C++/CUDA neural network framework

C++ 4,408 541 Updated Dec 14, 2025

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。

C++ 4,144 418 Updated Jan 29, 2026

C/C++/ObjC language server supporting cross references, hierarchies, completion and semantic highlighting

C++ 4,029 274 Updated Nov 30, 2025

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 3,590 833 Updated Feb 6, 2026

a distributed deep learning platform

C++ 3,585 1,268 Updated Jan 14, 2026

LightSeq: A High Performance Library for Sequence Processing and Generation

C++ 3,304 335 Updated May 16, 2023

Postmodern immutable and persistent data structures for C++ — value semantics at scale

C++ 2,801 199 Updated Jan 29, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,117 172 Updated Jan 29, 2026

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,859 248 Updated Feb 6, 2026

C++14 lock-free queue.

C++ 1,810 207 Updated Jan 31, 2026

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,571 618 Updated Feb 15, 2025

An efficient C++20 GPU numerical computing library with Python-like syntax

C++ 1,403 111 Updated Feb 2, 2026

Collective communications library with various primitives for multi-machine training.

C++ 1,396 346 Updated Feb 6, 2026

Userspace eBPF runtime for Observability, Network, GPU & General Extensions Framework

C++ 1,376 157 Updated Jan 26, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,242 89 Updated Aug 28, 2025

High-Performance Rendering Framework on Stream Architectures

C++ 985 96 Updated Feb 6, 2026

VUDA is a header-only library based on Vulkan that provides a CUDA Runtime API interface for writing GPU-accelerated applications.

C++ 902 40 Updated Jan 21, 2024

collection of benchmarks to measure basic GPU capabilities

C++ 492 78 Updated Oct 24, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 461 82 Updated Feb 6, 2026

C++ library for reading and writing of numpy's .npy files

C++ 423 73 Updated Oct 3, 2024
Next