Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
Ongoing research training transformer models at scale
Community maintained hardware plugin for vLLM on Ascend
Apache Spark - A unified analytics engine for large-scale data processing
The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
For developers, who are building real-time data-driven applications, Redis is the preferred, fastest, and most feature-rich cache, data structure server, and document and vector query engine.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
Cost-efficient and pluggable Infrastructure components for GenAI inference
谷歌新书Agent设计模式(agentic design patterns)中文版,持续更新。附:在线阅读、pdf和epub电子书下载。
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Advanced data structure and algorithm for system design,系统设计需要了解的算法
Efficient and easy multi-instance LLM serving
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means
libco is a coroutine library which is widely used in wechat back-end service. It has been running on tens of thousands of machines since 2013.