Stars
ncnn is a high-performance neural network inference framework optimized for the mobile platform
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
FlashMLA: Efficient Multi-head Latent Attention Kernels
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports comp…
High-speed Large Language Model Serving for Local Deployment
MariaDB server is a community developed fork of MySQL server. Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stab…
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
HIP: C++ Heterogeneous-Compute Interface for Portability
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for…
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Universal model exchange and serialization format for decision tree forests
Trainable fast and memory-efficient sparse attention
Kolosal AI is an OpenSource and Lightweight alternative to LM Studio to run LLMs 100% offline on your device.
Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X
An Arduino-based six legged robot with extensive documentation.