Skip to content

raw-lab/HydraMPP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

54 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ‰ HydraMPP

Lightweight, many-headed parallel processing โ€” from your laptop to the whole cluster.

Python PyPI Conda Downloads PePY overall Downloads PePy monthly Downloads PePy weekly License Build Platform Dependencies Bioinformatics

โšก Ray-like API โ€ข ๐Ÿงต multiprocessing core โ€ข ๐ŸŒ Multi-node โ€ข ๐Ÿงฎ Native SLURM โ€ข ๐Ÿชถ Featherweight


๐Ÿ”ฌ What is HydraMPP?

HydraMPP (Hydra Massive Parallel Processing) is a high-performance Python library for distributed parallel processing. It scales the same code seamlessly from:

  • ๐Ÿ’ป A single laptop
  • ๐Ÿ–ฅ๏ธ A multi-core workstation
  • ๐ŸŒ A multi-node HPC cluster

Built for simplicity, portability, and near-zero dependencies, HydraMPP gives you a familiar @remote / .remote() programming model without dragging in a heavyweight runtime โ€” psutil and the Python standard library are all it needs.


โœจ Features

๐Ÿงต Parallel Execution

  • Familiar @hydraMPP.remote decorator
  • Non-blocking .remote() task submission
  • Per-call .options(num_cpus=N)
  • Automatic CPU detection via psutil
  • CPU-aware queueing and dispatch
  • Pure-Python worker spawning

๐ŸŒ Distributed Computing

  • Single machine โ†’ multi-node cluster
  • TCP socket transport with length framing
  • pickle-based object passing
  • Native SLURM auto-configuration
  • Live UDP status monitor
  • Three modes: local โ€ข host โ€ข client

โšก Why HydraMPP?

Feature HydraMPP
๐Ÿชถ Featherweight (psutil + stdlib only) โœ…
๐Ÿงฌ Ray-style API (@remote / .remote) โœ…
๐Ÿ’ป Laptop โ†’ ๐ŸŒ cluster, same code โœ…
๐Ÿงฎ Native SLURM auto-wiring โœ…
๐Ÿ“ก Built-in live status monitor โœ…
๐Ÿ“ฆ PyPI and conda-forge โœ…
๐Ÿ” Small, readable, auditable codebase โœ…
๐Ÿ Pure Python, no native build step โœ…

๐Ÿงฑ Architecture

flowchart LR
    A["Your Python Script"] -->|"remote tasks"| B["HydraMPP Driver"]
    B --> C["Shared Job Queue"]
    C --> D["Scheduler (main_loop)"]
    D -->|"local"| E["Worker Processes"]
    D -->|"TCP + pickle"| F["Client Node 1"]
    D -->|"TCP + pickle"| G["Client Node N"]
    H["hydra-status.py"] -. "UDP" .-> D
Loading

๐Ÿ‰ Tech Stack

Component Technology
Language Python โ‰ฅ 3.6
Parallelism multiprocessing
Resource detection psutil
Networking TCP sockets (struct framing)
Serialization pickle
Cluster integration SLURM (scontrol / srun)
Status monitor UDP datagram
CLI helper argparse

๐Ÿš€ Installation

1๏ธโƒฃ From PyPI

pip install hydraMPP

No admin rights? Install into your home folder:

pip install --user hydraMPP

2๏ธโƒฃ From conda-forge

conda install -c conda-forge hydraMPP

3๏ธโƒฃ From source

git clone https://github.com/raw-lab/HydraMPP
cd HydraMPP
pip install .

โšก Quick Start

import time
import hydraMPP


# 1๏ธโƒฃ Tag any function you want to run in parallel
@hydraMPP.remote
def slow_square(x, seconds=1):
    time.sleep(seconds)
    return x * x


def main():
    # 2๏ธโƒฃ Initialize (local mode, auto-detect CPUs)
    hydraMPP.init()

    # 3๏ธโƒฃ Fire off jobs โ€” submission is non-blocking, returns a job id
    jobs = [slow_square.remote(i) for i in range(20)]

    # 4๏ธโƒฃ Request more resources for a heavier task
    jobs.append(slow_square.options(num_cpus=4).remote(99))

    # 5๏ธโƒฃ Drain results as they finish
    ready, pending = hydraMPP.wait(jobs)
    while pending:
        for job in ready:
            result = hydraMPP.get(job)
            print(f"{result[1]} -> {result[2]}")   # func name -> return value
        ready, pending = hydraMPP.wait(pending)

    hydraMPP.shutdown()


if __name__ == "__main__":
    main()

๐ŸŒ Running Modes

HydraMPP runs in three modes, selected by the address argument to init():

Mode How to start Role
๐Ÿ’ป local hydraMPP.init() Single machine, all CPUs local
๐Ÿ–ฅ๏ธ host hydraMPP.init(address="host", port=24515) Coordinator that accepts clients
๐Ÿ›ฐ๏ธ client hydraMPP.init(address="10.0.0.5", port=24515) Worker node that joins a host
# Coordinator (host) โ€” waits for clients to connect
hydraMPP.init(address="host", port=24515, timeout=10)

# Worker (client) โ€” connects to the host and offers its CPUs
hydraMPP.init(address="10.0.0.5", port=24515, num_cpus=36)

๐Ÿ“š API Reference

Function Description
hydraMPP.init(address, num_cpus, ...) Start HydraMPP in local / host / client mode
@hydraMPP.remote Tag a function for parallel execution
func.remote(*args, **kwargs) Queue a tagged function; returns a job id
func.options(num_cpus=N).remote(...) Set per-call resource options
hydraMPP.wait(queue, timeout, max) Split a job list into (ready, pending)
hydraMPP.get(id) Retrieve a finished job's result record
hydraMPP.put(name, obj) Place an object into the queue as a finished result
hydraMPP.nodes() List the connected nodes
hydraMPP.shutdown() Tear down workers and the manager

๐Ÿ“‹ get() result record

hydraMPP.get(id) returns a record describing the job. Fields are accessible by index:

Index Field Meaning
0 finished bool โ€” whether the job has completed
1 func_name name of the executed function
2 ret the function's return value
3 num_cpus number of CPUs the job used
4 runtime wall-clock run time, in seconds
5 hostname node the job ran on

๐Ÿ“ก Status Monitor

A helper script is included to inspect a running HydraMPP cluster over UDP.

usage: hydra-status.py [-h] [address] [port]

positional arguments:
  address     Address of the HydraMPP server [127.0.0.1]
  port        Port to connect to [24515]

options:
  -h, --help  show this help message and exit

It prints connected clients, available CPUs, and queued jobs, then exits. For continuous monitoring, wrap it with watch:

watch -n1 hydra-status.py localhost

๐Ÿงฎ SLURM Integration

HydraMPP can auto-configure host and client nodes inside a SLURM allocation. Just add --hydraMPP-slurm $SLURM_JOB_NODELIST to your program's invocation and HydraMPP wires up the cluster for you.

๐Ÿ’ก Call hydraMPP.init() after all functions have been tagged with @hydraMPP.remote. Use --hydraMPP-cpus to set CPUs per node; 0 or omitted lets HydraMPP guess.

#!/bin/bash
#SBATCH --job-name=My_Slurm_Job
#SBATCH --nodes=3
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=36
#SBATCH --mem=100G
#SBATCH --time=1-0
#SBATCH -o slurm-%x-%j.out

echo "====================================================="
echo "Start Time  : $(date)"
echo "Submit Dir  : $SLURM_SUBMIT_DIR"
echo "Job ID/Name : $SLURM_JOBID / $SLURM_JOB_NAME"
echo "Node List   : $SLURM_JOB_NODELIST"
echo "Num Tasks   : $SLURM_NTASKS total [$SLURM_NNODES nodes @ $SLURM_CPUS_ON_NODE CPUs/node]"
echo "====================================================="

path/to/program.py --custom-args \
    --hydraMPP-slurm-auto --hydraMPP-cpus  $SLURM_CPUS_ON_NODE

๐Ÿ“„ License

Creative Commons Attribution-NonCommercial (CC BY-NC 4.0)

See the LICENSE file for details.


๐Ÿ“– Citation

If you use HydraMPP in published work, please cite:

Figueroa III JL, White III RA. 2026
HydraMPP: A lightweight library for distributed massive parallel processing in Python - threading at scale. BioRxiv

๐Ÿค Contributing

We welcome:

  • ๐Ÿงต Scheduling and dispatch improvements
  • ๐ŸŒ Networking and fault-tolerance enhancements
  • ๐Ÿ” Authentication / transport security
  • ๐Ÿ“ก Status-monitor features
  • ๐Ÿงช Tests and benchmarks

Pull requests and issues are encouraged.


๐Ÿ“ž Support


๐Ÿ‰ HydraMPP

Lightweight. Distributed. Parallel.

Built with โค๏ธ in Python.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors