Skip to content
View botbw's full-sized avatar
  • Nanyang Technological University
  • Singapore
  • 16:55 (UTC +08:00)

Highlights

  • Pro

Block or report botbw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python 1,270 168 Updated Jun 12, 2026

Agent-friendly GPU profile-query CLI

Rust 82 2 Updated Jun 12, 2026
JavaScript 3 Updated Jun 5, 2026

Pie: Programmable LLM Serving

Rust 175 22 Updated Jun 16, 2026
Python 252 27 Updated Jun 9, 2026

Learn CUDA with PyTorch

Cuda 333 50 Updated Jun 1, 2026

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,967 326 Updated Jan 14, 2026

Accepted to MLSys 2026

Python 87 7 Updated Apr 19, 2026
Jupyter Notebook 134 15 Updated Nov 11, 2024

⭐️ A cross-platform CLI All-in-One assistant tool for Claude Code, Codex & Gemini CLI.

Rust 3,570 205 Updated Jun 15, 2026

mKernel: fast multi-node, multi-GPU fused kernels

Cuda 233 22 Updated Jun 8, 2026

Dynamic Memory Management for Serving LLMs without PagedAttention

C 493 42 Updated Jun 10, 2026

An Efficient and Versatile Inference Engine for Distributed LLM Serving

Python 60 4 Updated Jun 16, 2026

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 5,131 3,338 Updated May 4, 2026

LP_Bench

Python 14 4 Updated Feb 27, 2026

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Python 270 15 Updated Mar 19, 2026

MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends(PPoPP'26)

C++ 15 3 Updated Dec 31, 2025

Using a swizzled hierarchical layout for GEMM

Python 4 Updated Jun 9, 2026

Academic Research Skills for Claude Code: research → write → review → revise → finalize

Python 31,899 2,627 Updated Jun 15, 2026

Efficient Long-context Language Model Training by Core Attention Disaggregation

Python 105 7 Updated Apr 7, 2026

A Jekyll theme for academia

HTML 232 228 Updated Jul 8, 2024

Open-source framework for the research and development of foundation models.

Python 1,115 132 Updated Jun 16, 2026
Python 183 29 Updated Jun 15, 2026

Microsoft Azure Traces

Jupyter Notebook 1,147 182 Updated Jun 3, 2026

Nex Venus Communication Library

C++ 76 7 Updated Nov 17, 2025
Jupyter Notebook 32 8 Updated May 28, 2024

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,913 124 Updated Jan 21, 2024

Multi-GPU CUDA stress test

C++ 2,229 407 Updated May 31, 2026

Pure Rust + CUDA LLM inference engine

Rust 412 53 Updated Jun 16, 2026

Foundry materializes CUDA graphs along with its execution context to disk to support fast cold start of serving engines.

C++ 36 4 Updated Jun 15, 2026
Next