Stars
A lightweight, single-header C++11 Jinja2 template engine for LLM chat templates.
A collection of practical, end-to-end AI application examples accelerated by MemryX hardware and software solutions. This repository offers examples for real-time video inference, object detectionβ¦
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks β¦
The repository provides code for running inference with the Meta Segment Anything Model 3 (SAM 3).
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Multi-stream video inference with Ultralytics YOLO - Display multiple video streams in a grid layout with real-time object detection.
π€ Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime
A high-performance tool for video upscaling, interpolation, depth estimation, and more. Available as a CLI and Adobe Extension.
π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
State-of-the-art Machine Learning for the web. Run π€ Transformers directly in your browser, with no need for a server!
FlagGems is an operator library for large language models implemented in the Triton Language.
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READβ¦