Stars
- All languages
- APL
- ASP.NET
- ActionScript
- Agda
- Assembly
- Batchfile
- Bikeshed
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Common Lisp
- Coq
- Crystal
- Cuda
- Cython
- Dart
- Elixir
- Erlang
- F#
- Futhark
- G-code
- GDScript
- GLSL
- Go
- HLSL
- HTML
- Handlebars
- Haskell
- Haxe
- Janet
- Java
- JavaScript
- Jinja
- Julia
- Jupyter Notebook
- Just
- Kotlin
- LLVM
- Lean
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Markdown
- Mathematica
- Metal
- NASL
- Nim
- OCaml
- Objective-C
- Objective-C++
- OpenSCAD
- PHP
- POV-Ray SDL
- Python
- R
- Rocq Prover
- Roff
- Ruby
- Rust
- SCSS
- SWIG
- Scala
- ShaderLab
- Shell
- Slash
- Solidity
- SuperCollider
- Svelte
- Swift
- SystemVerilog
- TeX
- TypeScript
- VHDL
- Verilog
- Vue
- Zig
Instant neural graphics primitives: lightning fast NeRF and more
A massively parallel, optimal functional runtime in Rust
Code and data for paper "Deep Painterly Harmonization": https://arxiv.org/abs/1804.03189
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
FSA/FST algorithms, differentiable, with PyTorch compatibility.
Efficient GPU kernels for block-sparse matrix multiplication and convolution
State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
CUDA-accelerated Fully Homomorphic Encryption Library
A fast and highly scalable GPU dynamic memory allocator
A GPU algorithm for sparse matrix-matrix multiplication
CUDA implementation of parallel radix sort using Blelloch scan
Library of common noise functions for CUDA kernels
A simple library-less CUDA implementation of the OneSweep sorting algorithm.