-
NVIDIA
- Tokyo
-
01:11
(UTC +09:00)
Stars
- All languages
- AppleScript
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Common Lisp
- Crystal
- Cuda
- Cython
- Dart
- Dockerfile
- Elixir
- Fortran
- Go
- HCL
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MATLAB
- MLIR
- Makefile
- Markdown
- OCaml
- Objective-C
- PHP
- Perl
- Prolog
- Python
- R
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Starlark
- Swift
- TeX
- Thrift
- Tree-sitter Query
- TypeScript
- VHDL
- Vim Script
- Vue
- Wren
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without un…
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
fanshiqing / grouped_gemm
Forked from tgale96/grouped_gemmPyTorch bindings for CUTLASS grouped GEMM.
Triton-based Symmetric Memory operators and examples
An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
Ship correct and fast LLM kernels to PyTorch
Minimalistic 4D-parallelism distributed training framework for education purpose
This repository contains the source code for a static website that provides documentation for each "Graph Break" identified by a Graph Break ID (GBID).
Distributed Compiler based on Triton for Parallel Systems
Manages Unified Access to Generative AI Services built on Envoy Gateway
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also …
A syntax-highlighting pager for git, diff, grep, and blame output
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
A fast type checker and language server for Python
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Universal LLM Deployment Engine with ML Compilation