Stars
收集全网 Android TV电视盒子应用,涵盖影视、直播、K歌、工具、游戏等类型,整理优质APK资源,支持便捷下载与自动更新。提供安全验证、分类索引与兼容性标注,助力用户打造家庭影音娱乐中心! ✅ TVBox/影视仓等影音壳接口配置源。
RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI
A pytorch quantization backend for optimum
SGLang is a high-performance serving framework for large language models and multimodal models.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
A Python framework for GPU-accelerated simulation, robotics, and machine learning.
A Datacenter Scale Distributed Inference Serving Framework
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
Godot Engine – Multi-platform 2D and 3D game engine
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Windows Calculator: A simple yet powerful calculator that ships with Windows
Learning Large Language Model (LLM)(大语言模型学习)
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Bring background images to your vscode. vscode background 背景扩展插件。
This is a Chinese translation of the CUDA programming guide
Deformable DETR: Deformable Transformers for End-to-End Object Detection.
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Fast and memory-efficient exact attention
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.