Skip to content
View lovelydett's full-sized avatar
🚗
Auto-driving
🚗
Auto-driving

Highlights

  • Pro

Block or report lovelydett

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
10 stars written in Cuda
Clear filter

DeepEP: an efficient expert-parallel communication library

Cuda 8,830 1,037 Updated Dec 24, 2025

Fast CUDA matrix multiplication from scratch

Cuda 987 149 Updated Sep 2, 2025

CUDA 算子手撕与面试指南

Cuda 743 82 Updated Aug 23, 2025

cuVS - a library for vector search and clustering on the GPU

Cuda 599 150 Updated Dec 24, 2025

A simple high performance CUDA GEMM implementation.

Cuda 421 42 Updated Jan 4, 2024

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 397 52 Updated Jan 2, 2025

High-Performance SGEMM on CUDA devices

Cuda 114 5 Updated Jan 21, 2025

REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.

Cuda 103 11 Updated Dec 24, 2022

A tool for examining GPU scheduling behavior.

Cuda 89 20 Updated Aug 17, 2024

Cost-efficient Out-of-core GNN Training System on TB-scale Graph [ICDE 25]

Cuda 22 4 Updated Jan 6, 2025