Skip to content
View chiakicage's full-sized avatar
🦀
rusting
🦀
rusting
  • Zhejiang University

Highlights

  • Pro

Block or report chiakicage

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

dLLM: Simple Diffusion Language Modeling

Python 1,504 154 Updated Dec 22, 2025

Userspace eBPF runtime for Observability, Network, GPU & General Extensions Framework

C++ 1,313 145 Updated Dec 23, 2025

Perplexity open source garden for inference technology

Rust 310 26 Updated Dec 9, 2025

DELTA-pytorch:DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation

C++ 12 3 Updated Apr 16, 2024
C++ 5 1 Updated Sep 22, 2025

We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstra…

C++ 191 11 Updated Jan 28, 2025

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 104 6 Updated Jun 28, 2025

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 462 20 Updated Dec 23, 2025

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

Python 51 4 Updated Dec 18, 2025

Curated collection of papers in machine learning systems

483 34 Updated Dec 13, 2025

Rust version of THU uCore OS. Linux compatible.

Rust 3,645 378 Updated Aug 24, 2023

[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive

Cuda 55 2 Updated Dec 11, 2025
Python 248 24 Updated Jul 27, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,006 1,593 Updated Dec 23, 2025

The Higher-Order Intermediate Representation

C++ 160 18 Updated Dec 22, 2025

Random stuff.

Cuda 5 Updated Dec 2, 2025

Training neural networks in TensorFlow 2.0 with 5x less memory

Python 137 17 Updated Feb 21, 2022

A lightweight design for computation-communication overlap.

Cuda 200 9 Updated Oct 10, 2025

A high performance and generic framework for distributed DNN training

Python 3,715 494 Updated Oct 3, 2023
C++ 335 33 Updated Dec 20, 2025

A tool of adding a gitmoji for your commit message automatically.

Rust 2 Updated Jul 10, 2025

kernels, of the mega variety

Python 633 34 Updated Sep 28, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,004 161 Updated Dec 20, 2025

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 157 18 Updated Dec 23, 2025

CUDA Driver API Calls Interception

C++ 9 Updated Apr 15, 2024

Shared library for intercepting CUDA Runtime API calls. This was part of my Bachelor thesis: A Study on the Computational Exploitation of Remote Virtualized Graphics Cards (https://bit.ly/37tIG0D)

C++ 14 Updated Jun 6, 2024

[EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization

Python 21 5 Updated Aug 6, 2025

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 3,261 256 Updated Dec 23, 2025
Python 7 1 Updated Mar 27, 2025
Next