-
sageattention-autotune Public
SageAttention with autotuned block sizes
-
ComfyUI-FeatherOps Public
Fast fp16-fp8 mixed precision matmul on RDNA3/3.5 GPUs without native fp8
-
-
SageAttention Public
Forked from thu-ml/SageAttentionFork of SageAttention for Windows wheels and easy installation
-
-
rocm-systems Public
Forked from ROCm/rocm-systemsFork of rocm-systems (rocr-runtime and rocprofiler-sdk) for PC sampling and thread tracing on gfx1151 (Strix Halo)
C++ UpdatedJun 7, 2026 -
-
SpargeAttn Public
Forked from thu-ml/SpargeAttnFork of SpargeAttention (SparseSageAttention) for Windows wheels and easy installation
-
mathstudio-apk-modernized Public
MathStudio Android APK modernized for Android >= 7
-
safetensors Public
Forked from safetensors/safetensorsPython Apache License 2.0 UpdatedMay 1, 2026 -
tinycc Public
Forked from TinyCC/tinyccFork of TinyCC for use in triton-windows
C GNU Lesser General Public License v2.1 UpdatedApr 20, 2026 -
upload-artifact-repro Public
Reproduce a zip created by the upload-artifact action given the file modified time
JavaScript MIT License UpdatedApr 5, 2026 -
causal-conv1d Public
Forked from Dao-AILab/causal-conv1dFork of causal-conv1d for Python and libtorch stable ABI
Cuda BSD 3-Clause "New" or "Revised" License UpdatedMar 29, 2026 -
project-asteria Public
Project Asteria: A Naïve Introductory to Advanced Mathematics and Theoretical Physics for Gaokao Students
-
-
typeset Public
自动修正中文、英文、代码混合排版中的全半角、空格等问题
-
typeset-rs Public
自动修正中文、英文、代码混合排版中的全半角、空格等问题
-
-
linux-amdgpu-driver Public
Forked from torvalds/linuxFork of amdgpu mainline driver for PC sampling and thread tracing on gfx1151 (Strix Halo)
C Other UpdatedMar 19, 2026 -
ComfyUI_frontend Public
Forked from Comfy-Org/ComfyUI_frontendTypeScript GNU General Public License v3.0 UpdatedMar 13, 2026 -
rdna35-isa-markdown Public
AMD GPU RDNA3.5 (Strix Halo) instruction set architecture manual in Markdown for AI retrieval
3 UpdatedMar 5, 2026 -
amdgpu Public
Forked from ROCm/amdgpu(Outdated) Fork of amdgpu non-mainline dkms driver for PC sampling on gfx1151 (Strix Halo)
C Other UpdatedMar 4, 2026 -
transformers-qwen3-moe-fused Public
Fused Qwen3 MoE layer for faster training, compatible with Transformers, LoRA, bnb 4-bit quant, Unsloth. Also possible to train LoRA over GGUF
-
triton-windows Public archive
Forked from triton-lang/tritonFork of the Triton language and compiler for Windows support and easy installation
-
-
ACE-Step Public
Forked from ace-step/ACE-StepFork of ACE-Step v1.0 for LoRA training with < 10 GB VRAM
-
ComfyUI Public
Forked from Comfy-Org/ComfyUI -
ComfyUI-RadialAttn Public
RadialAttention in ComfyUI native workflow
-
stability-ComfyUI-nodes Public
Forked from Stability-AI/stability-ComfyUI-nodes -