Skip to content
View BKitor's full-sized avatar

Block or report BKitor

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Run Slurm on Kubernetes. A Slinky project.

Go 317 89 Updated Jun 22, 2026
6 2 Updated Apr 27, 2026

Modular RDMA Interface

C++ 139 52 Updated Jun 23, 2026

Tenstorrent Topology (TT-Topology) is a command line utility used to flash multiple NB cards on a system to use specific eth routing configurations.

Python 16 15 Updated Jun 11, 2026

Declarative RKE2 Kubernetes cluster bootstrap and lifecycle management with AMD GPU and ROCm support

Go 8 3 Updated Jun 16, 2026

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 2,422 423 Updated Jun 23, 2026

ScalarLM - a unified training and inference stack

Python 113 19 Updated Jun 3, 2026

Moved to Codeberg

Zig 43,200 3,060 Updated Nov 27, 2025

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,423 159 Updated Jun 23, 2026

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,675 973 Updated Jun 17, 2026

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!

Python 10,158 907 Updated Jun 23, 2026

NVIDIA NCCL Tests for Distributed Training

Shell 147 32 Updated Jun 22, 2026

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 3,428 540 Updated Jun 23, 2026

QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.

C++ 38 8 Updated Aug 29, 2025

Machine Learning Engineering Open Book

Python 18,160 1,152 Updated May 18, 2026

The future home for CnC Tests and Framework Libaries

C 59 6 Updated Jan 24, 2026

A fast MoE impl for PyTorch

Python 1,857 206 Updated Feb 10, 2025

Benchmark suite for LLMs from Fireworks.ai

Python 107 38 Updated Jun 22, 2026

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 3,676 156 Updated Jun 23, 2026

Open Fabric Interfaces

C 802 508 Updated Jun 23, 2026

A PyTorch native platform for training generative AI models

Python 5,456 868 Updated Jun 23, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,408 8,859 Updated Jun 23, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,625 18,358 Updated Jun 23, 2026

i3wm multiple monitors auto configuration

Go 148 13 Updated Nov 29, 2024

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C 1,391 189 Updated Jun 15, 2026

A BUDE virtual-screening benchmark, in many programming models

C++ 31 17 Updated Oct 15, 2024

MPI Partitioned Microbenchmarks

C 3 4 Updated Sep 13, 2022

Official MPICH Repository

C 680 325 Updated Jun 22, 2026

Open MPI main development repository

C 2,605 964 Updated Jun 22, 2026

A benchmark suite to evaluate CPU and GPU communication efficiency of MPI using different communication patterns

C 3 1 Updated May 11, 2016
Next