🚀 Build high-performance AI applications with this C++ engine for Retrieval Augmented Generation (RAG) and efficient memory management.
-
Updated
Mar 27, 2026 - C++
🚀 Build high-performance AI applications with this C++ engine for Retrieval Augmented Generation (RAG) and efficient memory management.
Compress context data to optimize memory and performance in C++ large language model applications within the llm-cpp toolkit.
Explore LLM model deployment based on AXera's AI chips
Run generative AI models in sophgo BM1684X/BM1688
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
A high-performance inference system for large language models, designed for production environments.
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
llama.cpp 🦙 LLM inference in TypeScript
Add a description, image, and links to the llama3 topic page so that developers can more easily learn about it.
To associate your repository with the llama3 topic, visit your repo's landing page and select "manage topics."