ocss884

Follow

🎯

Focusing

Junrong Lin ocss884

🎯

Focusing

Follow

@QwenLM by day, @sgl-project by night

105 followers · 51 following

283 production
聖蹟桜ヶ丘
06:00 (UTC +09:00)
ocss.lin@gmail.com
https://junronglin.com

Achievements

Achievements

Organizations

Stars

ademeure / QuickRunCUDA

C++ 17 4 Updated Apr 9, 2026

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 9,066 2,311 Updated Mar 30, 2026

xjdr-alt / entropix

Entropy Based Sampling and Parallel CoT Decoding

Python 3,431 321 Updated Nov 13, 2024

jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 35,345 3,512 Updated Apr 9, 2026

NVIDIA / TileGym

Helpful kernel tutorials and examples for tile-based GPU programming

Python 695 59 Updated Apr 9, 2026

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 17,651 1,119 Updated Mar 16, 2026

flashinfer-ai / flashinfer-bench-starter-kit

FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels

Python 157 122 Updated Apr 3, 2026

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,013 130 Updated Apr 4, 2026

tile-ai / tilelang-puzzles

Learning TileLang with 10 puzzles!

Python 168 20 Updated Mar 31, 2026

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 918 107 Updated Apr 8, 2026

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 625 68 Updated Apr 1, 2026

tile-ai / TileRT

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 696 41 Updated Mar 8, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,316 856 Updated Mar 22, 2026

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,264 688 Updated Apr 9, 2026

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 5,361 879 Updated Apr 9, 2026

RLsys-Foundation / TritonForge

🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench

Python 134 6 Updated Nov 10, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,475 499 Updated Apr 8, 2026

thinking-machines-lab / batch_invariant_ops

Python 986 74 Updated Nov 4, 2025

mlc-ai / xgrammar

Fast, Flexible and Portable Structured Generation

C++ 1,620 136 Updated Apr 9, 2026

NVIDIA / tilus

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 470 21 Updated Apr 9, 2026

microsoft / microxcaling

PyTorch emulation library for Microscaling (MX)-compatible data formats

Python 351 48 Updated Jun 18, 2025

wilicc / gpu-burn

Multi-GPU CUDA stress test

C++ 2,146 401 Updated Nov 4, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,001 2,061 Updated Mar 27, 2026

OpenBB-finance / OpenBB

Financial data platform for analysts, quants and AI agents.

Python 65,628 6,515 Updated Apr 9, 2026

google-gemini / gemini-cli

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 100,741 12,998 Updated Apr 9, 2026

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 768 200 Updated Apr 2, 2026

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 23,247 2,602 Updated Apr 8, 2026

fzyzcjy / rl_visualizer

Visualize and post-hoc analyze RL training for debugging and understanding

TypeScript 6 Updated Jul 23, 2025

gaogaotiantian / dowhen

An intuitive and low-overhead instrumentation tool for Python

Python 1,203 41 Updated Jul 8, 2025

sgl-project / genai-bench

Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.

Python 287 51 Updated Apr 2, 2026