- Nanjing
- All languages
- ANTLR
- ASL
- Assembly
- Batchfile
- BitBake
- Bluespec
- C
- C#
- C++
- CMake
- CSS
- Clojure
- Cuda
- Cython
- Dart
- Dockerfile
- Emacs Lisp
- GLSL
- Go
- HTML
- Java
- JavaScript
- Jupyter Notebook
- Koka
- LLVM
- Lua
- Makefile
- Markdown
- Mathematica
- Objective-C
- Objective-C++
- PHP
- Perl
- PostScript
- PowerShell
- Python
- Rich Text Format
- Roff
- Ruby
- Rust
- SAS
- Sass
- Scala
- Shell
- Starlark
- Swift
- SystemVerilog
- TSQL
- TeX
- TypeScript
- Verilog
- Vim Script
Starred repositories
Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices
Documentation of NVIDIA chip/hardware interfaces
Go RPC framework with high-performance and strong-extensibility for building micro-services.
A high-performance non-blocking I/O networking framework focusing on RPC scenarios.
A do everything Redfish, KVM, GUI, and DBus webserver for OpenBMC
NVIDIA / bmcweb
Forked from openbmc/bmcwebA do everything Redfish, KVM, GUI, and DBus webserver for OpenBMC
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
GPU Admin Tools. Includes Confidential Computing controls for H100, and other functionality
Ancillary open source software to support confidential computing on NVIDIA GPUs
Resource scheduling and cluster management for AI
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
Tool to collect debug logs from NVIDIA server components, in band and out-of-band.
A command line utility to manage the configuration of a system's high performance network interfaces for RoCE deployments
Translation of C++ Core Guidelines [https://github.com/isocpp/CppCoreGuidelines] into Simplified Chinese.
slime is an LLM post-training framework for RL Scaling.
A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.
torchcomms: a modern PyTorch communications API
DLRover: An Automatic Distributed Deep Learning System
Source code examples from the Parallel Forall Blog