BKitor

Follow

Benjamin Kitor BKitor

Follow

Computer Engineer

9 followers · 8 following

https://www.linkedin.com/in/bkitor/

Achievements

Achievements

Stars

SlinkyProject / slurm-operator

Run Slurm on Kubernetes. A Slinky project.

Go 317 89 Updated Jun 22, 2026

ROCm / amd-nhc

6 2 Updated Apr 27, 2026

ROCm / mori

Modular RDMA Interface

C++ 139 52 Updated Jun 23, 2026

tenstorrent / tt-topology

Tenstorrent Topology (TT-Topology) is a command line utility used to flash multiple NB cards on a system to use specific eth routing configurations.

Python 16 15 Updated Jun 11, 2026

silogen / cluster-bloom

Declarative RKE2 Kubernetes cluster bootstrap and lifecycle management with AMD GPU and ROCm support

Go 8 3 Updated Jun 16, 2026

vllm-project / production-stack

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 2,422 423 Updated Jun 23, 2026

tensorwavecloud / ScalarLM

ScalarLM - a unified training and inference stack

Python 113 19 Updated Jun 3, 2026

ziglang / zig

Moved to Codeberg

Zig 43,200 3,060 Updated Nov 27, 2025

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,423 159 Updated Jun 23, 2026

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,675 973 Updated Jun 17, 2026

OpenPipe / ART

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!

Python 10,158 907 Updated Jun 23, 2026

coreweave / nccl-tests

NVIDIA NCCL Tests for Distributed Training

Shell 147 32 Updated Jun 22, 2026

llm-d / llm-d

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 3,428 540 Updated Jun 23, 2026

mk1-project / quickreduce

QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.

C++ 38 8 Updated Aug 29, 2025

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 18,160 1,152 Updated May 18, 2026

ChipsandCheese / CnC-Tools

The future home for CnC Tests and Framework Libaries

C 59 6 Updated Jan 24, 2026

laekov / fastmoe

A fast MoE impl for PyTorch

Python 1,857 206 Updated Feb 10, 2025

fw-ai / benchmark

Benchmark suite for LLMs from Fireworks.ai

Python 107 38 Updated Jun 22, 2026

zml / zml

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 3,676 156 Updated Jun 23, 2026

ofiwg / libfabric

Open Fabric Interfaces

C 802 508 Updated Jun 23, 2026

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 5,456 868 Updated Jun 23, 2026

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,408 8,859 Updated Jun 23, 2026

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,625 18,358 Updated Jun 23, 2026

lpicanco / i3-autodisplay

i3wm multiple monitors auto configuration

Go 148 13 Updated Nov 29, 2024

NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C 1,391 189 Updated Jun 15, 2026

UoB-HPC / miniBUDE

A BUDE virtual-screening benchmark, in many programming models

C++ 31 17 Updated Oct 15, 2024

Yiltan / MPI-Partitioned-Microbenchmarks

MPI Partitioned Microbenchmarks

C 3 4 Updated Sep 13, 2022

pmodels / mpich

Official MPICH Repository

C 680 325 Updated Jun 22, 2026

open-mpi / ompi

Open MPI main development repository

C 2,605 964 Updated Jun 22, 2026

imanfaraji / MPI-ACC

A benchmark suite to evaluate CPU and GPU communication efficiency of MPI using different communication patterns

C 3 1 Updated May 11, 2016