cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
-
Updated
May 8, 2026 - Python
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
An open-source interface to use the multiple-precision solver SDPA-GMP with YALMIP
Memory-optimized SongGeneration (v2 Large) for 16GB VRAM GPUs. Features 8-bit µ-law KV-caching, fused layers, and SDPA/Triton integration.
PyTorch implementation of YOLOv12 with Scaled Dot-Product Attention (SDPA) optimized by FlashAttention for fast and efficient object detection.
A 66M parameter decoder-only transformer language model implemented from scratch in PyTorch. Features a custom SentencePiece tokenizer, RoPE positional embeddings, SwiGLU feed-forward network, per-layer KV cache for efficient autoregressive inference, and a Svelte-based streaming chat interface.
Add a description, image, and links to the sdpa topic page so that developers can more easily learn about it.
To associate your repository with the sdpa topic, visit your repo's landing page and select "manage topics."