[ArXiv 2025] A curated list of papers on on-device large language models, focusing on model compression and system optimization techniques from the survey "On-Device Large Language Models: A Survey…

30 3 Updated Jan 27, 2026

tile-ai / TileRT

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 705 42 Updated Mar 8, 2026

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,975 1,263 Updated Apr 14, 2026

tom-jerr / MiniInfer

Python 3 1 Updated Mar 15, 2026

sihyeong / Awesome-LLM-Inference-Engine

183 13 Updated Apr 13, 2026

BBuf / tvm_mlir_learn

compiler learning resources collect.

Python 2,711 370 Updated Mar 19, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,922 268 Updated Apr 9, 2026

NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

C 16,893 1,658 Updated Apr 3, 2026

GeeeekExplorer / nano-vllm

Nano vLLM

Python 12,899 1,928 Updated Apr 13, 2026

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,553 1,027 Updated Apr 15, 2026

HeKun-NVIDIA / CUDA-Programming-Guide-in-Chinese

This is a Chinese translation of the CUDA programming guide

1,933 283 Updated Nov 13, 2024

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 42,125 7,441 Updated Apr 15, 2026

HeKun-NVIDIA / TensorRT-Developer_Guide_in_Chinese

320 75 Updated May 11, 2022

sgl-project / sgl-learning-materials

Materials for learning SGLang

799 61 Updated Jan 5, 2026

atomicapple0 / libsmctrl

Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.

C 59 5 Updated Nov 24, 2025

cap-lab / jedi

Jetson embedded platform-target deep learning inference acceleration framework with TensorRT

C++ 30 6 Updated Oct 10, 2025

yalue / cuda_scheduling_examiner_mirror

A tool for examining GPU scheduling behavior.

Cuda 96 22 Updated Aug 17, 2024

OpenDriveLab / UniAD

[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving

Python 4,571 533 Updated Oct 29, 2025

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,896 2,344 Updated Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

刘益辰 The-Lyc

Achievements

Achievements

Highlights

Block or report The-Lyc

Stars

scienceaix / agentskills

alanc98 / rtems-docker

NVIDIA / TensorRT-Edge-LLM

xlite-dev / Awesome-LLM-Inference

StarCycle / Awesome-Embodied-AI-Job

ggml-org / ggml

ggml-org / llama.cpp

AmberLJC / LLMSys-PaperList

massgravel / Microsoft-Activation-Scripts

ovg-project / kvcached

interestingLSY / swiftLLM

LumosJiang / Awesome-On-Device-LLMs