Skip to content
View akoumpa's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@NVIDIA

Block or report akoumpa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TORCH_TRACE parser for PT2

Rust 80 27 Updated Mar 17, 2026

DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.

Python 75 23 Updated Mar 24, 2026

A library for exporting models including NeMo and Hugging Face to optimized inference backends, and deploying them for efficient querying

Python 33 9 Updated Apr 2, 2026

An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.

Python 59 17 Updated Apr 2, 2026

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 944 307 Updated Apr 2, 2026

Ship correct and fast LLM kernels to PyTorch

Python 147 17 Updated Jan 14, 2026

PyTorch Single Controller

Rust 1,003 156 Updated Apr 2, 2026

Dion optimizer algorithm

Python 457 52 Updated Apr 2, 2026

CI/CD templates for NeMo-FW libraries

Python 5 6 Updated Apr 1, 2026

A Quirky Assortment of CuTe Kernels

Python 891 101 Updated Apr 2, 2026

Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

Python 405 108 Updated Apr 2, 2026

A logging tool for deep learning.

Python 65 20 Updated Mar 31, 2025

Scalable toolkit for efficient model reinforcement

Python 1,495 316 Updated Apr 2, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,400 136 Updated Mar 11, 2026

The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.

Python 2,132 593 Updated Apr 2, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,524 1,768 Updated Apr 2, 2026

TorchOpt is an efficient library for differentiable optimization built upon PyTorch.

Python 628 42 Updated Mar 2, 2026

Run PyTorch LLMs locally on servers, desktop and mobile

Python 3,619 247 Updated Sep 10, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,392 3,556 Updated Apr 2, 2026

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 41,917 7,401 Updated Apr 2, 2026

The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as Transformer Engine, Megatron-LM, and NeMo.

Python 19 8 Updated Sep 17, 2025

Safe code refactoring for modern Python.

Python 1,612 136 Updated Jun 21, 2024

these are custom recipes of nvidia nsight system post collection analysis.

Python 16 1 Updated Nov 7, 2025
Python 109 28 Updated Mar 12, 2026

extensible collectives library in triton

Python 98 6 Updated Mar 31, 2025

A lightweight library for PyTorch training tools and utilities

Python 1,719 300 Updated Apr 1, 2026

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,190 571 Updated Aug 22, 2025

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Python 955 63 Updated Mar 24, 2026

MIG Partition Editor for NVIDIA GPUs

Go 243 57 Updated Apr 1, 2026

pytest plugin for distributed testing and loop-on-failures testing modes.

Python 1,820 256 Updated Mar 30, 2026
Next