Stars
7
stars
written in Python
Clear filter
Tensors and Dynamic neural networks in Python with strong GPU acceleration
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Optimizing inference proxy for LLMs
[ICLR 2024] Lemur: Open Foundation Models for Language Agents
Write PyTorch code at the level of individual examples, then run it efficiently on minibatches.
Original Python version of Intel® Nervana™ Graph