Lists (8)
Sort Name ascending (A-Z)
- All languages
- Agda
- Assembly
- AutoIt
- Batchfile
- Bikeshed
- Bluespec
- Brainfuck
- C
- C#
- C++
- CMake
- COBOL
- CSS
- Classic ASP
- Clojure
- Common Lisp
- Coq
- Cuda
- Cython
- D
- DIGITAL Command Language
- Dart
- Dockerfile
- Dune
- Earthly
- Eiffel
- Emacs Lisp
- Erlang
- F#
- F*
- Fortran
- Futhark
- G-code
- GCC Machine Description
- GLSL
- Gnuplot
- Go
- Groovy
- HTML
- Hack
- Handlebars
- Haskell
- Haxe
- Idris
- Inno Setup
- Java
- JavaScript
- Jinja
- Julia
- Jupyter Notebook
- Koka
- Kotlin
- LLVM
- Lean
- Logos
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Markdown
- Mathematica
- Mermaid
- Mojo
- MoonBit
- Nunjucks
- OCaml
- Objective-C
- OpenQASM
- OpenSCAD
- PHP
- PLpgSQL
- Perl
- PostScript
- PowerShell
- Processing
- Python
- R
- Racket
- Raku
- ReScript
- Reason
- Rez
- Rich Text Format
- Rocq Prover
- Roff
- Ruby
- Rust
- SCSS
- SMT
- SQL
- SWIG
- Scala
- Scheme
- Shell
- Solidity
- SourcePawn
- Standard ML
- Starlark
- Svelte
- Swift
- SystemVerilog
- TLA
- TSQL
- Tcl
- TeX
- Tree-sitter Query
- TypeScript
- Typst
- VBScript
- VHDL
- Verilog
- Vim Script
- Vue
- WebAssembly
- XSLT
- Zig
- reStructuredText
Starred repositories
A massively parallel, optimal functional runtime in Rust
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
DeepEP: an efficient expert-parallel communication library
This package contains the original 2012 AlexNet code.
Introduction to Parallel Programming class code
Learn CUDA Programming, published by Packt
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
Source code that accompanies The CUDA Handbook.
Static suckless single batch CUDA-only qwen3-0.6B mini inference engine
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
A curated set of C++ examples for optimization-based elastodynamic contact simulation using CUDA, emphasizing algorithmic convergence, penetration-free, and inversion-free conditions. Designed for …
Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"
Neural network from scratch in CUDA/C++