Stars
Fast CUDA matrix multiplication from scratch
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
A simple high performance CUDA GEMM implementation.
Sample codes for my CUDA programming book
oneAPI Threading Building Blocks (oneTBB)
Solutions and Notes for Labs of Computer Systems: A Programmer's Perspective 3rd Editon // 《深入理解计算机系统》第三版的实验文件、解答与笔记
个人整理的《深入理解计算机系统》中文电子版(原书第 3 版)与实验材料:https://hansimov.gitbook.io/csapp/
💻 Computer Systems: A Programmer's Perspective, Lab Assignments Solutions
MIT6.S081实验官方纯净源代码以及我为此制作的开箱即用的实验环境(部署后即可以用web端的Vscode进行实验),转载于MIT官方仓库git clone git://g.csail.mit.edu/xv6-labs-2020,由于GitHub上没有放出2020版本的MIT6.S081的实验源代码仓库,故在此转载一下,方便大家Fork,也方便我自己使用
🎶 MIT 6.S081 Operating System Engineering (Now known as 6.1810)
FastLog是一个基于现代C++23标准开发的高性能日志系统,提供了控制台日志和文件日志两种输出方式
A generic cross-platform C library that includes many commonly used components and frameworks, and a new scripting language interpreter. It currently supports C99 and Aspect-Oriented Programming (…
G3log is an asynchronous, "crash safe", logger that is easy to use with default logging sinks or you can add your own. G3log is made with plain C++14 (C++11 support up to release 1.3.2) with no ext…