- Bellevue WA U.S
- https://duoan.github.io/
- in/duoan
Lists (2)
Sort Name ascending (A-Z)
Stars
Our library for RL environments + evals
An asynchronous streaming data management module for efficient post-training.
RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI
Model interpretability and understanding for PyTorch
The WeightWatcher tool for predicting the accuracy of Deep Neural Networks
A flexible and high-performance training framework designed for large-scale foundation model training on AMD GPUs
High-performance GPU kernels for LLM inference in OpenAI Triton. Fused RMSNorm, SwiGLU, INT8 GEMM with benchmarks and roofline analysis.
Code release for book "Efficient Training in PyTorch"
AI 基础知识 - GPU 架构、CUDA 编程、大模型基础及AI Agent 相关知识。
TradingAgents: Multi-Agents LLM Financial Trading Framework
AI Infrastructure Engineer Learning Track - Production ML infrastructure curriculum (2-4 years experience)
Academic Research Skills for Claude Code: research → write → review → revise → finalize
Now, Stronger AI Pushes Frontiers, Stronger Our Shared Future.
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…
A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do
AI Infrastructure Performance Engineer Learning Track - GPU optimization, inference optimization, and cost reduction
Universal LLM Deployment Engine with ML Compilation
Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming
Tile primitives for speedy kernels
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.
A systematic and pedagogical way to derive the correctness structure of 2D Register Allocated GEMM before coding.
DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm
Reverse engineering NVIDIA SASS instruction dictionary, kernel audits and pattern recognition across GPU architectures.