Skip to content
View The-Lyc's full-sized avatar

Highlights

  • Pro

Block or report The-Lyc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A collection of Docker files for the RTEMS RTOS tools and BSP builds

Dockerfile 13 7 Updated Dec 14, 2021

High-performance, light-weight C++ LLM and VLM Inference Software for Physical AI

C++ 321 45 Updated Mar 19, 2026

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 5,120 359 Updated Mar 26, 2026

Lumina Robotics Talent Call | Lumina社区具身智能招贤榜 | A list for Embodied AI / Robotics Jobs (PhD, RA, intern, etc

1,373 25 Updated Feb 25, 2026

Tensor library for machine learning

C++ 14,346 1,534 Updated Apr 2, 2026

LLM inference in C/C++

C++ 101,099 16,284 Updated Apr 3, 2026

Large Language Model (LLM) Systems Paper List

1,898 97 Updated Mar 24, 2026

Open-source Windows and Office activator featuring HWID, Ohook, TSforge, and Online KMS activation methods, along with advanced troubleshooting.

Batchfile 170,728 16,425 Updated Mar 9, 2026

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 840 95 Updated Apr 2, 2026

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 319 36 Updated Jun 10, 2025

[ArXiv 2025] A curated list of papers on on-device large language models, focusing on model compression and system optimization techniques from the survey "On-Device Large Language Models: A Survey…

30 3 Updated Jan 27, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 690 40 Updated Mar 8, 2026

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,892 1,251 Updated Apr 3, 2026
Python 2 1 Updated Mar 15, 2026

compiler learning resources collect.

Python 2,702 368 Updated Mar 19, 2025

how to optimize some algorithm in cuda.

Cuda 2,906 267 Updated Apr 1, 2026

NVIDIA Linux open GPU kernel module source

C 16,852 1,651 Updated Apr 3, 2026

Nano vLLM

Python 12,671 1,863 Updated Nov 3, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,474 989 Updated Apr 3, 2026

This is a Chinese translation of the CUDA programming guide

1,928 281 Updated Nov 13, 2024

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 41,940 7,401 Updated Apr 3, 2026

Materials for learning SGLang

791 60 Updated Jan 5, 2026

Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.

C 59 5 Updated Nov 24, 2025

Jetson embedded platform-target deep learning inference acceleration framework with TensorRT

C++ 30 6 Updated Oct 10, 2025

A tool for examining GPU scheduling behavior.

Cuda 96 22 Updated Aug 17, 2024

[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving

Python 4,555 527 Updated Oct 29, 2025

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,860 2,338 Updated Mar 25, 2026

Ongoing research training transformer models at scale

Python 15,905 3,784 Updated Apr 3, 2026
Next