Stars
A collection of Docker files for the RTEMS RTOS tools and BSP builds
High-performance, light-weight C++ LLM and VLM Inference Software for Physical AI
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Lumina Robotics Talent Call | Lumina社区具身智能招贤榜 | A list for Embodied AI / Robotics Jobs (PhD, RA, intern, etc
Large Language Model (LLM) Systems Paper List
Open-source Windows and Office activator featuring HWID, Ohook, TSforge, and Online KMS activation methods, along with advanced troubleshooting.
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
[ArXiv 2025] A curated list of papers on on-device large language models, focusing on model compression and system optimization techniques from the survey "On-Device Large Language Models: A Survey…
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
how to optimize some algorithm in cuda.
NVIDIA Linux open GPU kernel module source
A Datacenter Scale Distributed Inference Serving Framework
This is a Chinese translation of the CUDA programming guide
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.
Jetson embedded platform-target deep learning inference acceleration framework with TensorRT
A tool for examining GPU scheduling behavior.
[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Ongoing research training transformer models at scale