Lists (1)
Sort Name ascending (A-Z)
Starred repositories
ncnn implementation of Z-Image image generater
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
2021年最新整理, C++ 学习资料,含C++ 11 / 14 / 17 / 20 / 23 新特性、入门教程、推荐书籍、优质文章、学习笔记、教学视频等
📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising (TIP, 2017)
Unsloth Studio is a web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.
Source Code for 'Modern Arm Assembly Language Programming' by Daniel Kusswurm
Code for Command-Line Rust (O'Reilly, 2024, ISBN 9781098109417)
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Source Code for 'Foundations of ARM64 Linux Debugging, Disassembling, and Reversing' by Dmitry Vostokov
Learn CUDA Programming, published by Packt
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
A converter for llama2.c legacy models to ncnn models.
📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependencies
This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!