Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
- All languages
- APL
- Assembly
- Astro
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CodeQL
- CoffeeScript
- Common Lisp
- Cuda
- Cython
- Dart
- Dhall
- Dockerfile
- Elixir
- Erlang
- F#
- Frege
- Futhark
- GLSL
- Go
- HCL
- HTML
- Handlebars
- Haskell
- Haxe
- Java
- JavaScript
- Jupyter Notebook
- Koka
- Kotlin
- LLVM
- Lean
- Lua
- MLIR
- Markdown
- Mojo
- Nim
- Nix
- OCaml
- Objective-C++
- OpenQASM
- PHP
- PLpgSQL
- Perl
- PowerShell
- Python
- R
- Racket
- Rocq Prover
- Ruby
- Rust
- Scala
- Scheme
- Shell
- Smarty
- Standard ML
- Starlark
- Svelte
- Swift
- TSQL
- TeX
- Text
- TypeScript
- VHDL
- Zig
- mupad
Starred repositories
A massively parallel, optimal functional runtime in Rust
DeepEP: an efficient expert-parallel communication library
Flash Attention in ~100 lines of CUDA (forward pass only)
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …
Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch
cuVS - a library for vector search and clustering on the GPU
Fast k nearest neighbor search using GPU
State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
A simple GPU hash table implemented in CUDA using lock free techniques
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…
Harmonia is an algorithm that allows for the implementation of operations on B+ trees using parallelization. As a part of my GPU project, I implemented the Harmonia paper published in 2019 in CUDA.
Comparison of regression line calculation in CUDA GPU code vs. AVX-512 code