-
Tencent
- ShenZhen
-
12:29
(UTC +08:00)
Lists (7)
Sort Name ascending (A-Z)
Stars
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
[CVPR 2026] STCDiT for Real-World Video Enhancement and AIGC Enhancement. It achieves temporally stable and structurally faithful restoration even under complex motions.
💻 vibe coding 2026 | Your First Modern Coding course beginners to master step by step.
A self-learning tutorail for CUDA High Performance Programing.
Development repository for the Triton language and compiler
TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
A feed-forward 3D foundation model for reconstructing scenes from streaming data
Introduction to Parallel Programming class code
FlashInfer: Kernel Library for LLM Serving
zihaomu / SageAttention-int4
Forked from thu-ml/SageAttention[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
SGLang is a high-performance serving framework for large language models and multimodal models.
[TPAMI 2026]Adaptive Sparse Self-Attention for Efficient Image Super-resolution and Beyond
A high-quality speech analysis, manipulation and synthesis system
OpenClaw 中文官方技能库 | 翻译自 Clawdbot 官方技能,按场景分类整理,支持中文自然语言调用
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
adefossez / demucs
Forked from facebookresearch/demucsCode for the paper Hybrid Spectrogram and Waveform Source Separation
Code for the paper Hybrid Spectrogram and Waveform Source Separation
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
[CVPR 2025] Official code repository for "Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach"
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching