- Tsukuba, Japan
- http://twitter.com/bongole
Stars
- All languages
- ABAP
- ANTLR
- ASL
- ActionScript
- Assembly
- Astro
- AutoHotkey
- AutoIt
- Awk
- Batchfile
- Bikeshed
- C
- C#
- C++
- CMake
- COBOL
- CSS
- Clojure
- CoffeeScript
- Common Lisp
- Crystal
- Cuda
- Cython
- D
- Dart
- Dockerfile
- Elixir
- Erlang
- F#
- GLSL
- Go
- Groovy
- HCL
- HTML
- Handlebars
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lean
- Logos
- Lua
- MDX
- Makefile
- Markdown
- Meson
- Mojo
- MoonBit
- Nim
- Nix
- OCaml
- Objective-C
- Objective-C++
- PHP
- PLpgSQL
- Perl
- PostScript
- PowerShell
- Python
- QML
- Reason
- Rich Text Format
- Ruby
- Rust
- SCSS
- SVG
- Scala
- Scheme
- ShaderLab
- Shell
- Smarty
- Starlark
- Svelte
- Swift
- SystemVerilog
- TLA
- TeX
- TypeScript
- TypeSpec
- Typst
- VHDL
- Vala
- Vim Script
- Vue
- WebAssembly
- Yacc
- Zig
- jq
DeepEP: an efficient expert-parallel communication library
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
GPU Accelerated t-SNE for CUDA with Python bindings
CUDA-accelerated GIS and spatiotemporal algorithms
Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
bycloudai / instant-ngp-Windows
Forked from NVlabs/instant-ngpInstant neural graphics primitives: lightning fast NeRF and more
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
Notes on "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj (4th ed.)