Stars
2
stars
written in Cuda
Clear filter
Flash Attention in ~100 lines of CUDA (forward pass only)
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed th…