Lists (1)
Sort Name ascending (A-Z)
Starred repositories
A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search sc…
The BusTub Relational Database Management System (Educational)
C++ Insights - See your source code with the eyes of a compiler
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。
C++ implementation of the Python Numpy library
A retargetable MLIR-based machine learning compiler and runtime toolkit.
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
A family of header-only, very fast and memory-friendly hashmap and btree containers.
Modern concurrency for C++. Tasks, executors, timers and C++20 coroutines to rule them all
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
Nameof operator for modern C++, simply obtain the name of a variable, type, function, macro, and enum
Cista is a simple, high-performance, zero-copy C++ serialization & reflection library.
Simple, light-weight and easy-to-use asynchronous components
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
llm deploy project based mnn. This project has merged into MNN.
Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
Portable header-only C++ low level SIMD library
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
workspace是基于C++11的轻量级异步执行框架,支持:通用任务异步并发执行、优先级任务调度、自适应动态线程池、高效静态线程池、异常处理机制等。