- Shanghai, China
-
00:15
(UTC +08:00)
Highlights
Starred repositories
Intelligent Router for Mixture-of-Models
NexAU (AU for Agent Universe), a general-purpose agent framework for building intelligent agents with tool capabilities.
how to optimize some algorithm in cuda.
HuggingFace conversion and training library for Megatron-based models
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A framework for efficient model inference with omni-modality models
Accelerating MoE with IO and Tile-aware Optimizations
An early research stage expert-parallel load balancer for MoE models based on linear programming.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
A high-performance and light-weight router for vLLM large scale deployment
A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
SWE-bench: Can Language Models Resolve Real-world Github Issues?
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
These are personal utilities that are useful for personal use
Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
Autonomous GPU Kernel Generation via Deep Agents
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
NexDR (Nex Deep Research), a leading deep research agent that autonomously investigates complex topics and generates rich, structured reports.
NexRL is an ultra-loosely-coupled LLM post-training framework.