Skip to content
View cslvjt's full-sized avatar
  • Tencent
  • ShenZhen
  • 12:29 (UTC +08:00)

Block or report cslvjt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

Python 27 2 Updated Jun 1, 2026

[CVPR 2026]

Python 56 1 Updated May 29, 2026

[CVPR 2026] STCDiT for Real-World Video Enhancement and AIGC Enhancement. It achieves temporally stable and structurally faithful restoration even under complex motions.

Python 65 1 Updated Jun 4, 2026

💻 vibe coding 2026 | Your First Modern Coding course beginners to master step by step.

JavaScript 17,187 1,624 Updated Jun 17, 2026

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 1,022 103 Updated Jan 14, 2026

Development repository for the Triton language and compiler

MLIR 19,494 2,952 Updated Jun 21, 2026

TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration

Python 1,597 182 Updated Mar 27, 2026

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 683 46 Updated Mar 6, 2026

A feed-forward 3D foundation model for reconstructing scenes from streaming data

Python 7,296 717 Updated Jun 17, 2026

Introduction to Parallel Programming class code

Cuda 1,354 1,145 Updated Jun 27, 2022

FlashInfer: Kernel Library for LLM Serving

Python 5,834 1,066 Updated Jun 22, 2026

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 1 Updated Apr 15, 2026

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 313 19 Updated Feb 24, 2026

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 1,003 95 Updated Feb 25, 2026

BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).

Python 574 103 Updated Nov 20, 2023

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,509 6,645 Updated Jun 22, 2026

[TPAMI 2026]Adaptive Sparse Self-Attention for Efficient Image Super-resolution and Beyond

Python 33 1 Updated Jun 2, 2026

A high-quality speech analysis, manipulation and synthesis system

C++ 1,321 265 Updated Feb 18, 2026

OpenClaw 中文官方技能库 | 翻译自 Clawdbot 官方技能,按场景分类整理,支持中文自然语言调用

4,135 400 Updated May 26, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 379,845 79,517 Updated Jun 22, 2026

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 2,834 266 Updated May 28, 2026

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 10,232 1,519 Updated Apr 24, 2024

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 4,252 350 Updated Aug 14, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 21,770 2,508 Updated May 25, 2026

📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

Python 569 27 Updated Jun 13, 2026

[CVPR 2025] Official code repository for "Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach"

Python 339 11 Updated Mar 15, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 11,298 1,163 Updated Jun 21, 2026

A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.

Python 1,206 75 Updated Jun 22, 2026

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

Python 428 47 Updated Jul 5, 2025
Next