Skip to content
View maxidl's full-sized avatar
  • Hanover, Germany

Block or report maxidl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. ellamind/inference-hive ellamind/inference-hive Public

    inference-hive is a toolkit to run distributed LLM inference on SLURM clusters. Configure a few cluster, inference server and data settings, and scale your inference workload across thousands of GPUs.

    Python 1 2

  2. openreviewer openreviewer Public

    Generate high-quality peer reviews of machine learning and AI conference papers.

    Python 3

  3. SLURM PyTorch NCCL Multi-Node Test S... SLURM PyTorch NCCL Multi-Node Test Script: A SLURM batch script that tests PyTorch's NCCL functionality across multiple GPU nodes. The script sets up a distributed PyTorch environment using torchrun and runs a comprehensive test that verifies NCCL initialization, inter-process communication barriers, and proper cleanup. Includes diagnostic output for troubleshooting multi-node GPU communication issues in HPC environments.
    1
    #!/bin/bash
    2
    #SBATCH --job-name=pytorch-nccl-test
    3
    #SBATCH --partition=
    4
    #SBATCH --account=
    5
    #SBATCH --qos=
  4. re-shard parquet dataset re-shard parquet dataset
    1
    from pathlib import Path
    2
    
                  
    3
    import pyarrow.parquet as pq
    4
    import pyarrow.dataset as pads
    5