- France
Highlights
- Pro
Lists (13)
Sort Name ascending (A-Z)
- All languages
- Assembly
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- Cuda
- Dart
- Dockerfile
- Elixir
- GDScript
- GLSL
- Go
- Groovy
- HLSL
- HTML
- Haskell
- Haxe
- Inno Setup
- Java
- JavaScript
- Jinja
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MDX
- Makefile
- Markdown
- Max
- OCaml
- PHP
- Pascal
- PowerShell
- Processing
- Python
- QML
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Starlark
- Swift
- SystemVerilog
- TeX
- TypeScript
- Visual Basic .NET
- Vue
- Zig
Starred repositories
A massively parallel, optimal functional runtime in Rust
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without un…
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units (PPoPP'25)