Sponsors
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
An Open Source Machine Learning Framework for Everyone
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
A library for efficient similarity search and clustering of dense vectors.
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
a small build system with a focus on speed
FlashMLA: Efficient Multi-head Latent Attention Kernels
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive lea…
Transformer related optimization, including BERT, GPT
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
Optimized primitives for collective multi-GPU communication
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Collective communications library with various primitives for multi-machine training.
C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
PMLS-Caffe: Distributed Deep Learning Framework for Parallel ML System