#
Deep-learning-based A.I. code-bot.
Actually, I am the ingredient of the AI
(Heejun Lee)
- Anyang, Korea
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
5
stars
written in Cuda
Clear filter
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Study parallel programming - CUDA, OpenMP, MPI, Pthread