-
NVIDIA
- Tokyo
-
08:16
(UTC +09:00)
Stars
- All languages
- AppleScript
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Common Lisp
- Crystal
- Cuda
- Cython
- Dart
- Dockerfile
- Elixir
- Fortran
- Go
- HCL
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MATLAB
- MLIR
- Makefile
- Markdown
- OCaml
- Objective-C
- PHP
- Perl
- Prolog
- Python
- R
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Starlark
- Swift
- TeX
- Thrift
- Tree-sitter Query
- TypeScript
- VHDL
- Vim Script
- Vue
- Wren
incubator repo for CUDA-TileIR backend
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI training and inference, such as FP8 row-wise quantization and …
slime is an LLM post-training framework for RL Scaling.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without un…
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Triton-based Symmetric Memory operators and examples
An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
Ship correct and fast LLM kernels to PyTorch
Minimalistic 4D-parallelism distributed training framework for education purpose
This repository contains the source code for a static website that provides documentation for each "Graph Break" identified by a Graph Break ID (GBID).
Distributed Compiler based on Triton for Parallel Systems
Manages Unified Access to Generative AI Services built on Envoy Gateway
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also …
A syntax-highlighting pager for git, diff, grep, and blame output