-
Zhejiang University
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
A benchmark of real-world DL kernel problems
Universal LLM Deployment Engine with ML Compilation
Collection of memory microbenchmarks to investigate NVIDIA GPUs Network on Chip architectures
A garden of small programming language implementations 🪴
Modify implementations for Pierce' Types and Programming Languages to add a REPL, convert into dune projects, and provide preconfigured development containers based on devfiles
Userspace eBPF runtime for Observability, Network, GPU & General Extensions Framework
Perplexity open source garden for inference technology
DELTA-pytorch:DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstra…
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
Curated collection of papers in machine learning systems
Rust version of THU uCore OS. Linux compatible.
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Training neural networks in TensorFlow 2.0 with 5x less memory