Highlights
- Pro
Lists (13)
Sort Name ascending (A-Z)
Stars
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Helpful kernel tutorials and examples for tile-based GPU programming
how to optimize some algorithm in cuda.
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对…
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
A beta Dota2 Bot Script aims to provide better bot game experience
CUDA Matrix Multiplication Optimization
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
Unofficial description of the CUDA assembly (SASS) instruction sets.
An unofficial cuda assembler, for all generations of SASS, hopefully :)
Tensor Core Multiplication at the Speed of CuBLAS in Three Simple Steps
📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software
Awesome curated RSS feed links related to Machine Learning, Artificial Intelligence, Reinforcement Learning
Awesome RSS feeds - A curated list of RSS feeds (and OPML files) used in Recommended Feeds and local news sections of Plenary - an RSS reader, article downloader and a podcast player app for android
AI Crash Course to help busy builders catch up to the public frontier of AI research in 2 weeks
A curated list for Efficient Large Language Models
Dynamic Memory Management for Serving LLMs without PagedAttention
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
A curated list of foundation models for vision and language tasks
A generic cross-platform C library that includes many commonly used components and frameworks, and a new scripting language interpreter. It currently supports C99 and Aspect-Oriented Programming (…
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.