Principal AI/ML Research Engineer at HPE Labs, interested in LLM Agents, High Performance Computing, Deep Learning, and Computer Architecture
-
Hewlett Packard Enterprise
- Palo Alto, CA
- https://scholar.google.com/citations?user=B8WA2XsAAAAJ
Stars
10
stars
written in Cuda
Clear filter
Flash Attention in ~100 lines of CUDA (forward pass only)
Efficient GPU kernels for block-sparse matrix multiplication and convolution
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
A CUDNN minimal deep learning training code sample using LeNet.
Deep neural network framework for multiple GPUs
A GPU performance prediction toolkit for CUDA programs