大模型推理框架加速,让 LLM 飞起来
-
Updated
May 10, 2024 - Python
大模型推理框架加速,让 LLM 飞起来
Bench360 is a modular benchmarking suite for local LLM deployments. It offers a full-stack, extensible pipeline to evaluate the latency, throughput, quality, and cost of LLM inference on consumer and enterprise GPUs. Bench360 supports flexible backends, tasks and scenarios, enabling fair and reproducible comparisons for researchers & practitioners.
LLM Inference performance harness
Lightweight HTML form with Python Flask app and accompanying scripts for swift testing of interactions with SEA-LION family of LLMs.
Add a description, image, and links to the tgi topic page so that developers can more easily learn about it.
To associate your repository with the tgi topic, visit your repo's landing page and select "manage topics."