Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
-
Updated
Oct 14, 2024 - C++
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
LLaVA server (llama.cpp).
Runs LLaMA with Extremely HIGH speed
Explore LLM model deployment based on AXera's AI chips
Llama causal LM fully recreated in LibTorch. Designed to be used in Unreal Engine 5
Super easy to use library for doing LLaMA/GPT-J stuff! - Mirror of: https://gitlab.com/niansa/libjustlm
Multi-Model and multi-tasking llama Discord Bot - Mirror of: https://gitlab.com/niansa/discord_llama
Add a description, image, and links to the llama2 topic page so that developers can more easily learn about it.
To associate your repository with the llama2 topic, visit your repo's landing page and select "manage topics."