Stars
🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.
Universal LLM Deployment Engine with ML Compilation
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Efficient Triton Kernels for LLM Training
Open-source search and retrieval database for AI applications.
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.
Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1
A Datacenter Scale Distributed Inference Serving Framework
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
A high-throughput and memory-efficient inference and serving engine for LLMs