Stars
An Open Source Machine Learning Framework for Everyone
The new Windows Terminal and the original Windows console host, all in the same place!
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Karabiner-Elements is a powerful tool for customizing keyboards on macOS
🚀 Coroutine-based concurrency library for PHP
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning …
brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" mea…
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
Multi-Joint dynamics with Contact. A general purpose physics simulator.
A simple C++11 Thread Pool implementation
High-speed Large Language Model Serving for Local Deployment
Implementation of popular deep learning networks with TensorRT network definition API
TattieBogle Xbox 360 Driver (with improvements)
Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper.
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Lightning fast C++/CUDA neural network framework
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。
C++ library based on tensorrt integration
C++ Implementation of PyTorch Tutorials for Everyone
C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.