Stars
Official inference framework for 1-bit LLMs
Official inference library for Mistral models
A Datacenter Scale Distributed Inference Serving Framework
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
An Open Framework for Federated Learning.
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
An innovative library for efficient LLM inference via low-bit quantization
Intel® Extension for TensorFlow*
SPEAR: A Simulator for Photorealistic Embodied AI Research
SYCL implementation of Fused MLPs for Intel GPUs
Real-time human detection and tracking camera using YOLOV5 and Arduino