Stars
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
A Datacenter Scale Distributed Inference Serving Framework
Official inference framework for 1-bit LLMs
SYCL implementation of Fused MLPs for Intel GPUs
An innovative library for efficient LLM inference via low-bit quantization
Real-time human detection and tracking camera using YOLOV5 and Arduino
Official inference library for Mistral models
SPEAR: A Simulator for Photorealistic Embodied AI Research
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Intel® Extension for TensorFlow*
An Open Framework for Federated Learning.