Production
😎 A curated list of awesome MLOps tools
ZenML 🙏: One AI Platform from Pipelines to Agents. https://zenml.io.
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Rust wrapper for Microsoft's ONNX Runtime with CUDA support (version 1.7)
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
Transformer related optimization, including BERT, GPT
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Edge Inference in Browser with Transformer NLP model
Machine Learning Pipelines for Kubeflow
Exercises and supplementary material for the machine learning operations course at DTU.
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
asyncio (PEP 3156) Redis support
Serve, optimize and scale PyTorch models in production
Learn how to design, develop, deploy and iterate on production-grade ML applications.
A Redis module for serving tensors and executing deep learning graphs
Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.
Deploy a ML inference service on a budget in less than 10 lines of code.
Kubernetes-friendly ML model management, deployment, and serving.
🪄 Turns your machine learning code into microservices with web API, interactive GUI, and more.