akoumpa

🎯

Focusing

Alexandros Koumparoulis akoumpa

🎯

Focusing

36 followers · 27 following

@NVIDIA
Santa Clara, CA
09:52 (UTC -07:00)
https://github.com/NVIDIA-NeMo/Automodel

Achievements

x4 x3

Achievements

x4 x3

Organizations

Stars

meta-pytorch / tlparse

TORCH_TRACE parser for PT2

Rust 81 27 Updated Mar 17, 2026

NVIDIA / dgxc-benchmarking

DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.

Python 78 23 Updated Apr 7, 2026

NVIDIA-NeMo / Export-Deploy

A library for exporting models including NeMo and Hugging Face to optimized inference backends, and deploying them for efficient querying

Python 33 9 Updated Apr 8, 2026

meta-pytorch / autoparallel

An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.

Python 59 17 Updated Apr 8, 2026

flagos-ai / FlagGems

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 948 316 Updated Apr 8, 2026

meta-pytorch / BackendBench

Ship correct and fast LLM kernels to PyTorch

Python 147 17 Updated Jan 14, 2026

meta-pytorch / monarch

PyTorch Single Controller

Rust 1,007 157 Updated Apr 8, 2026

microsoft / dion

Dion optimizer algorithm

Python 463 52 Updated Apr 8, 2026

NVIDIA-NeMo / FW-CI-templates

CI/CD templates for NeMo-FW libraries

Python 5 6 Updated Apr 8, 2026

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 911 107 Updated Apr 8, 2026

NVIDIA-NeMo / Automodel

🚀 Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

Python 423 121 Updated Apr 8, 2026

NVIDIA / dllogger

A logging tool for deep learning.

Python 66 20 Updated Mar 31, 2025

NVIDIA-NeMo / RL

Scalable toolkit for efficient model reinforcement

Python 1,508 330 Updated Apr 8, 2026

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,402 136 Updated Mar 11, 2026

NVIDIA / NeMo-Agent-Toolkit

The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.

Python 2,156 601 Updated Apr 8, 2026

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,547 1,776 Updated Apr 8, 2026

metaopt / torchopt

TorchOpt is an efficient library for differentiable optimization built upon PyTorch.

Python 627 42 Updated Apr 6, 2026

pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile

Python 3,620 247 Updated Sep 10, 2025

verl-project / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,520 3,603 Updated Apr 8, 2026

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 42,026 7,420 Updated Apr 8, 2026

NVIDIA / nvidia-dlfw-inspect

The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as Transformer Engine, Megatron-LM, and NeMo.

Python 19 8 Updated Sep 17, 2025

facebookincubator / Bowler

Safe code refactoring for modern Python.

Python 1,613 136 Updated Jun 21, 2024

hyxcl / nsys_recipes

these are custom recipes of nvidia nsight system post collection analysis.

Python 16 1 Updated Nov 7, 2025

triton-lang / kernels

Python 109 28 Updated Mar 12, 2026

cchan / tccl

extensible collectives library in triton

Python 98 6 Updated Mar 31, 2025

meta-pytorch / tnt

A lightweight library for PyTorch training tools and utilities

Python 1,719 300 Updated Apr 1, 2026

meta-pytorch / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,189 571 Updated Aug 22, 2025

NVlabs / DoRA

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Python 956 63 Updated Mar 24, 2026

NVIDIA / mig-parted

MIG Partition Editor for NVIDIA GPUs

Go 245 58 Updated Apr 8, 2026

pytest-dev / pytest-xdist

pytest plugin for distributed testing and loop-on-failures testing modes.

Python 1,824 257 Updated Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alexandros Koumparoulis akoumpa

Achievements

Achievements

Organizations

Block or report akoumpa

Stars

meta-pytorch / tlparse

NVIDIA / dgxc-benchmarking

NVIDIA-NeMo / Export-Deploy

meta-pytorch / autoparallel

flagos-ai / FlagGems

meta-pytorch / BackendBench

meta-pytorch / monarch

microsoft / dion

NVIDIA-NeMo / FW-CI-templates

Dao-AILab / quack

NVIDIA-NeMo / Automodel

NVIDIA / dllogger

NVIDIA-NeMo / RL

ByteDance-Seed / Triton-distributed

NVIDIA / NeMo-Agent-Toolkit

NVIDIA / cutlass

metaopt / torchopt

pytorch / torchchat

verl-project / verl

ray-project / ray

NVIDIA / nvidia-dlfw-inspect

facebookincubator / Bowler

hyxcl / nsys_recipes

triton-lang / kernels

cchan / tccl

meta-pytorch / tnt

meta-pytorch / gpt-fast

NVlabs / DoRA

NVIDIA / mig-parted

pytest-dev / pytest-xdist