Skip to content
View akoumpa's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@NVIDIA

Block or report akoumpa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TORCH_TRACE parser for PT2

Rust 80 26 Updated Mar 17, 2026

DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.

Python 73 22 Updated Mar 24, 2026

A library for exporting models including NeMo and Hugging Face to optimized inference backends, and deploying them for efficient querying

Python 32 9 Updated Mar 24, 2026

An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.

Python 58 17 Updated Mar 28, 2026

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 934 298 Updated Mar 29, 2026

Ship correct and fast LLM kernels to PyTorch

Python 147 17 Updated Jan 14, 2026

PyTorch Single Controller

Rust 1,001 156 Updated Mar 28, 2026

Dion optimizer algorithm

Python 457 53 Updated Mar 28, 2026

CI/CD templates for NeMo-FW libraries

Python 5 5 Updated Mar 25, 2026

A Quirky Assortment of CuTe Kernels

Python 865 100 Updated Mar 28, 2026

Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

Python 395 102 Updated Mar 28, 2026

A logging tool for deep learning.

Python 65 20 Updated Mar 31, 2025

Scalable toolkit for efficient model reinforcement

Python 1,481 307 Updated Mar 28, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,398 135 Updated Mar 11, 2026

The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.

Python 2,107 588 Updated Mar 25, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,505 1,756 Updated Mar 24, 2026

TorchOpt is an efficient library for differentiable optimization built upon PyTorch.

Python 626 42 Updated Mar 2, 2026

Run PyTorch LLMs locally on servers, desktop and mobile

Python 3,622 246 Updated Sep 10, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,278 3,528 Updated Mar 28, 2026

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 41,861 7,391 Updated Mar 28, 2026

The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as Transformer Engine, Megatron-LM, and NeMo.

Python 19 8 Updated Sep 17, 2025

Safe code refactoring for modern Python.

Python 1,612 136 Updated Jun 21, 2024

these are custom recipes of nvidia nsight system post collection analysis.

Python 16 1 Updated Nov 7, 2025
Python 109 28 Updated Mar 12, 2026

extensible collectives library in triton

Python 98 6 Updated Mar 31, 2025

A lightweight library for PyTorch training tools and utilities

Python 1,718 300 Updated Mar 27, 2026

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,185 571 Updated Aug 22, 2025

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Python 952 63 Updated Mar 24, 2026

MIG Partition Editor for NVIDIA GPUs

Go 243 57 Updated Mar 22, 2026

pytest plugin for distributed testing and loop-on-failures testing modes.

Python 1,818 255 Updated Mar 23, 2026
Next