-
283 production
- 聖蹟桜ヶ丘
-
06:00
(UTC +09:00) - ocss.lin@gmail.com
- https://junronglin.com
Stars
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Entropy Based Sampling and Parallel CoT Decoding
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Helpful kernel tutorials and examples for tile-based GPU programming
Machine Learning Engineering Open Book
FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Accelerating MoE with IO and Tile-aware Optimizations
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
FlashInfer: Kernel Library for LLM Serving
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Fast, Flexible and Portable Structured Generation
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
PyTorch emulation library for Microscaling (MX)-compatible data formats
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Financial data platform for analysts, quants and AI agents.
An open-source AI agent that brings the power of Gemini directly into your terminal.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
Fast and memory-efficient exact attention
Visualize and post-hoc analyze RL training for debugging and understanding
An intuitive and low-overhead instrumentation tool for Python
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.