LLM-Dev-BB

GPU-Benchmarks-on-LLM-Inference Public

Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?

Jupyter Notebook 1

llama.cpp Public

LLM inference in C/C++

C++

ollama Public

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.

Go

llm-awq Public

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python

netron Public

Visualizer for neural network, deep learning and machine learning models

JavaScript

pytorch Public

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python

Provide feedback