Stars
An agent for CUDA compute-communication kernel co-design
A high-throughput and memory-efficient inference and serving engine for LLMs
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
Measure and optimize the energy consumption of your AI applications!
An SSH command runner with a focus on simplicity
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
Python packaging and dependency management made easy
Multi-DNN Inference Engine for Heterogeneous Mobile Processors
Visualizer for neural network, deep learning and machine learning models
Parses, and hovers math formula of c mathematical library functions