ocss884

Follow

🎯

Focusing

Junrong Lin ocss884

🎯

Focusing

Follow

@QwenLM by day, @sgl-project by night

106 followers · 51 following

283 production
聖蹟桜ヶ丘
04:41 (UTC +09:00)
ocss.lin@gmail.com
https://junronglin.com

Achievements

Achievements

Organizations

Stars

PolyArch / humanize

From Automated Idea Factory to Realization

Shell 578 41 Updated Apr 30, 2026

SandAI-org / MagiAttention

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 796 52 Updated Apr 21, 2026

ademeure / QuickRunCUDA

C++ 20 4 Updated Apr 24, 2026

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 9,128 2,321 Updated Mar 30, 2026

xjdr-alt / entropix

Entropy Based Sampling and Parallel CoT Decoding

Python 3,431 321 Updated Nov 13, 2024

jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 35,514 3,549 Updated Apr 30, 2026

NVIDIA / TileGym

Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming

Python 710 68 Updated Apr 30, 2026

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 17,835 1,133 Updated Mar 16, 2026

flashinfer-ai / flashinfer-bench-starter-kit

FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels

Python 161 130 Updated Apr 26, 2026

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,034 134 Updated Apr 28, 2026

tile-ai / tilelang-puzzles

Learning TileLang with 10 puzzles!

Python 240 29 Updated Apr 28, 2026

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 952 123 Updated Apr 30, 2026

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 664 80 Updated Apr 30, 2026

tile-ai / TileRT

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 715 43 Updated Mar 8, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,149 955 Updated Apr 24, 2026

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,312 716 Updated Apr 29, 2026

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 5,537 946 Updated Apr 30, 2026

RLsys-Foundation / TritonForge

🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench

Python 136 5 Updated Nov 10, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,954 538 Updated Apr 30, 2026

thinking-machines-lab / batch_invariant_ops

Python 1,000 77 Updated Nov 4, 2025

mlc-ai / xgrammar

Fast, Flexible and Portable Structured Generation

C++ 1,652 143 Updated Apr 29, 2026

NVIDIA / tilus

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 482 25 Updated Apr 27, 2026

microsoft / microxcaling

PyTorch emulation library for Microscaling (MX)-compatible data formats

Python 353 49 Updated Jun 18, 2025

wilicc / gpu-burn

Multi-GPU CUDA stress test

C++ 2,166 402 Updated Nov 4, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,059 2,072 Updated Mar 27, 2026

OpenBB-finance / OpenBB

Financial data platform for analysts, quants and AI agents.

Python 66,793 6,677 Updated Apr 30, 2026

google-gemini / gemini-cli

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 102,812 13,409 Updated Apr 30, 2026

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 814 218 Updated Apr 2, 2026

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 23,602 2,667 Updated Apr 30, 2026

fzyzcjy / rl_visualizer

Visualize and post-hoc analyze RL training for debugging and understanding

TypeScript 6 Updated Jul 23, 2025