Stars
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
DSPy: The framework for programming—not prompting—language models
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Transformer related optimization, including BERT, GPT
Makes ARM NEON documentation accessible (with examples)
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its…
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.
ncnn is a high-performance neural network inference framework optimized for the mobile platform
A PyTorch Implementation of Single Shot MultiBox Detector
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…
scikit-learn: machine learning in Python
Notebooks for a single-day DL crash course in Chinese
⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.
A high performance and generic framework for distributed DNN training
An open-access book on numpy vectorization techniques, Nicolas P. Rougier, 2017
Bringing Characters to Life with Computer Brains in Unity
State-of-the-art 2D and 3D Face Analysis Project
The fundamental package for scientific computing with Python.