-
Activeeon
- Paris, France
- https://andrewssobral.pages.dev
- @andrewssobral
- in/andrewssobral
- All languages
- Assembly
- AutoIt
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Cuda
- Cython
- Dart
- Dockerfile
- Eagle
- Elixir
- Fortran
- GDScript
- Go
- HTML
- Haskell
- JSON
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Markdown
- Mathematica
- Mojo
- Nim
- OCaml
- Objective-C
- PHP
- Perl
- Pony
- PowerShell
- Protocol Buffer
- Python
- QML
- R
- Raku
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Slint
- Svelte
- Swift
- SystemVerilog
- TeX
- Terra
- TypeScript
- VBScript
- Verilog
- Vue
- XSLT
- Zig
Starred repositories
This package contains the original 2012 AlexNet code.
how to optimize some algorithm in cuda.
Sample codes for my CUDA programming book
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Automatically exported from code.google.com/p/cuda-convnet2
Causal depthwise conv1d in CUDA, with a PyTorch interface
A GPU implementation of Convolutional Neural Nets in C++
Unsupervised Learning of Video Representations using LSTMs
llama3.cuda is a pure C/CUDA implementation for Llama 3 model.
Alex Krizhevsky's original code from Google Code
CUDA Matrix Factorization Library with Alternating Least Square (ALS)
gevtushenko / llm.c
Forked from karpathy/llm.cLLM training in simple, raw C/CUDA
This project optimizes multi-GPU parallelism for machine learning training by accelerating multi-GPU using fused gradient buffers, NCCL AllReduce, and CUDA C kernel-level optimizations including me…