Skip to content
View Chtholly-Boss's full-sized avatar
🫡
🫡
  • HITSZ
  • GuangDong Shenzhen

Highlights

  • Pro

Block or report Chtholly-Boss

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Open ABI and FFI for Machine Learning Systems

C++ 258 43 Updated Dec 23, 2025

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,238 190 Updated Dec 23, 2025

A community-maintained Python framework for creating mathematical animations.

Python 36,111 2,579 Updated Dec 22, 2025

Unofficial description of the CUDA assembly (SASS) instruction sets.

Python 183 18 Updated Jul 18, 2025

Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.

Python 95 10 Updated Feb 23, 2023

Third party assembler and GEMM library for NVIDIA Kepler GPU

CSS 85 21 Updated Oct 8, 2019

Nvidia Instruction Set Specification Generator

Python 305 17 Updated Jul 9, 2024

Helpful kernel tutorials and examples for tile-based GPU programming

Python 475 26 Updated Dec 23, 2025

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,659 86 Updated Dec 20, 2025

Collective communications library with various primitives for multi-machine training.

C++ 1,380 340 Updated Dec 2, 2025

🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送,1分钟手机通知,无需…

Python 40,200 20,837 Updated Dec 23, 2025

cuVS - a library for vector search and clustering on the GPU

Cuda 598 150 Updated Dec 23, 2025

Efficient Triton Kernels for LLM Training

Python 5,974 454 Updated Dec 23, 2025

An unofficial cuda assembler, for all generations of SASS, hopefully :)

Python 562 96 Updated Apr 20, 2023

🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.

TypeScript 36,877 3,388 Updated Dec 23, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,327 1,096 Updated Dec 2, 2025

Fast and memory-efficient exact attention

Python 21,262 2,244 Updated Dec 23, 2025

The Modular Platform (includes MAX & Mojo)

Mojo 25,377 2,745 Updated Dec 23, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,827 1,036 Updated Dec 23, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 476 72 Updated Oct 24, 2025

A Quirky Assortment of CuTe Kernels

Python 714 64 Updated Dec 23, 2025

A markup-based typesetting system that is powerful and easy to learn.

Rust 49,810 1,375 Updated Dec 22, 2025

ASCIIFlow

TypeScript 5,420 396 Updated Oct 27, 2024

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,423 637 Updated Dec 23, 2025

Low-bit LLM inference on CPU/NPU with lookup table

C++ 902 74 Updated Jun 5, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,341 614 Updated Dec 23, 2025

Efficient Top-K implementation on the GPU

Cuda 191 24 Updated Apr 9, 2019

A collection of my personal dotfiles

Lua 637 96 Updated Dec 17, 2025
Next