🐉 HydraMPP

Lightweight, many-headed parallel processing — from your laptop to the whole cluster.

⚡ Ray-like API • 🧵 multiprocessing core • 🌐 Multi-node • 🧮 Native SLURM • 🪶 Featherweight

🔬 What is HydraMPP?

HydraMPP (Hydra Massive Parallel Processing) is a high-performance Python library for distributed parallel processing. It scales the same code seamlessly from:

💻 A single laptop
🖥️ A multi-core workstation
🌍 A multi-node HPC cluster

Built for simplicity, portability, and near-zero dependencies, HydraMPP gives you a familiar @remote / .remote() programming model without dragging in a heavyweight runtime — psutil and the Python standard library are all it needs.

✨ Features

🧵 Parallel Execution

Familiar @hydraMPP.remote decorator
Non-blocking .remote() task submission
Per-call .options(num_cpus=N)
Automatic CPU detection via psutil
CPU-aware queueing and dispatch
Pure-Python worker spawning

🌐 Distributed Computing

Single machine → multi-node cluster
TCP socket transport with length framing
pickle-based object passing
Native SLURM auto-configuration
Live UDP status monitor
Three modes: local • host • client

⚡ Why HydraMPP?

Feature	HydraMPP
🪶 Featherweight (`psutil` + stdlib only)	✅
🧬 Ray-style API (`@remote` / `.remote`)	✅
💻 Laptop → 🌍 cluster, same code	✅
🧮 Native SLURM auto-wiring	✅
📡 Built-in live status monitor	✅
📦 PyPI and conda-forge	✅
🔍 Small, readable, auditable codebase	✅
🐍 Pure Python, no native build step	✅

🧱 Architecture

flowchart LR
    A["Your Python Script"] -->|"remote tasks"| B["HydraMPP Driver"]
    B --> C["Shared Job Queue"]
    C --> D["Scheduler (main_loop)"]
    D -->|"local"| E["Worker Processes"]
    D -->|"TCP + pickle"| F["Client Node 1"]
    D -->|"TCP + pickle"| G["Client Node N"]
    H["hydra-status.py"] -. "UDP" .-> D

🐉 Tech Stack

Component	Technology
Language	Python ≥ 3.6
Parallelism	`multiprocessing`
Resource detection	`psutil`
Networking	TCP sockets (`struct` framing)
Serialization	`pickle`
Cluster integration	SLURM (`scontrol` / `srun`)
Status monitor	UDP datagram
CLI helper	`argparse`

🚀 Installation

1️⃣ From PyPI

pip install hydraMPP

No admin rights? Install into your home folder:

pip install --user hydraMPP

2️⃣ From conda-forge

conda install -c conda-forge hydraMPP

3️⃣ From source

git clone https://github.com/raw-lab/HydraMPP
cd HydraMPP
pip install .

⚡ Quick Start

import time
import hydraMPP


# 1️⃣ Tag any function you want to run in parallel
@hydraMPP.remote
def slow_square(x, seconds=1):
    time.sleep(seconds)
    return x * x


def main():
    # 2️⃣ Initialize (local mode, auto-detect CPUs)
    hydraMPP.init()

    # 3️⃣ Fire off jobs — submission is non-blocking, returns a job id
    jobs = [slow_square.remote(i) for i in range(20)]

    # 4️⃣ Request more resources for a heavier task
    jobs.append(slow_square.options(num_cpus=4).remote(99))

    # 5️⃣ Drain results as they finish
    ready, pending = hydraMPP.wait(jobs)
    while pending:
        for job in ready:
            result = hydraMPP.get(job)
            print(f"{result[1]} -> {result[2]}")   # func name -> return value
        ready, pending = hydraMPP.wait(pending)

    hydraMPP.shutdown()


if __name__ == "__main__":
    main()

🌐 Running Modes

HydraMPP runs in three modes, selected by the address argument to init():

Mode	How to start	Role
💻 local	`hydraMPP.init()`	Single machine, all CPUs local
🖥️ host	`hydraMPP.init(address="host", port=24515)`	Coordinator that accepts clients
🛰️ client	`hydraMPP.init(address="10.0.0.5", port=24515)`	Worker node that joins a host

# Coordinator (host) — waits for clients to connect
hydraMPP.init(address="host", port=24515, timeout=10)

# Worker (client) — connects to the host and offers its CPUs
hydraMPP.init(address="10.0.0.5", port=24515, num_cpus=36)

📚 API Reference

Function	Description
`hydraMPP.init(address, num_cpus, ...)`	Start HydraMPP in local / host / client mode
`@hydraMPP.remote`	Tag a function for parallel execution
`func.remote(args, *kwargs)`	Queue a tagged function; returns a job id
`func.options(num_cpus=N).remote(...)`	Set per-call resource options
`hydraMPP.wait(queue, timeout, max)`	Split a job list into `(ready, pending)`
`hydraMPP.get(id)`	Retrieve a finished job's result record
`hydraMPP.put(name, obj)`	Place an object into the queue as a finished result
`hydraMPP.nodes()`	List the connected nodes
`hydraMPP.shutdown()`	Tear down workers and the manager

📋 `get()` result record

hydraMPP.get(id) returns a record describing the job. Fields are accessible by index:

Index	Field	Meaning
0	`finished`	`bool` — whether the job has completed
1	`func_name`	name of the executed function
2	`ret`	the function's return value
3	`num_cpus`	number of CPUs the job used
4	`runtime`	wall-clock run time, in seconds
5	`hostname`	node the job ran on

📡 Status Monitor

A helper script is included to inspect a running HydraMPP cluster over UDP.

usage: hydra-status.py [-h] [address] [port]

positional arguments:
  address     Address of the HydraMPP server [127.0.0.1]
  port        Port to connect to [24515]

options:
  -h, --help  show this help message and exit

It prints connected clients, available CPUs, and queued jobs, then exits. For continuous monitoring, wrap it with watch:

watch -n1 hydra-status.py localhost

🧮 SLURM Integration

HydraMPP can auto-configure host and client nodes inside a SLURM allocation. Just add --hydraMPP-slurm $SLURM_JOB_NODELIST to your program's invocation and HydraMPP wires up the cluster for you.

💡 Call hydraMPP.init() after all functions have been tagged with @hydraMPP.remote. Use --hydraMPP-cpus to set CPUs per node; 0 or omitted lets HydraMPP guess.

#!/bin/bash
#SBATCH --job-name=My_Slurm_Job
#SBATCH --nodes=3
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=36
#SBATCH --mem=100G
#SBATCH --time=1-0
#SBATCH -o slurm-%x-%j.out

echo "====================================================="
echo "Start Time  : $(date)"
echo "Submit Dir  : $SLURM_SUBMIT_DIR"
echo "Job ID/Name : $SLURM_JOBID / $SLURM_JOB_NAME"
echo "Node List   : $SLURM_JOB_NODELIST"
echo "Num Tasks   : $SLURM_NTASKS total [$SLURM_NNODES nodes @ $SLURM_CPUS_ON_NODE CPUs/node]"
echo "====================================================="

path/to/program.py --custom-args \
    --hydraMPP-slurm-auto --hydraMPP-cpus  $SLURM_CPUS_ON_NODE

📄 License

Creative Commons Attribution-NonCommercial (CC BY-NC 4.0)

See the LICENSE file for details.

📖 Citation

If you use HydraMPP in published work, please cite:

Figueroa III JL, White III RA. 2026
HydraMPP: A lightweight library for distributed massive parallel processing in Python - threading at scale. BioRxiv

🤝 Contributing

We welcome:

🧵 Scheduling and dispatch improvements
🌐 Networking and fault-tolerance enhancements
🔐 Authentication / transport security
📡 Status-monitor features
🧪 Tests and benchmarks

Pull requests and issues are encouraged.

📞 Support

🐛 Issues: HydraMPP Issues
📧 Contact:
- Dr. Richard Allen White III
- Jose Luis Figueroa III
If you have any questions or feedback, please feel free to get in touch by email.

🐉 HydraMPP

Lightweight. Distributed. Parallel.

Built with ❤️ in Python.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
bin		bin
dist		dist
lib		lib
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
run-test.sh		run-test.sh
setup.cfg		setup.cfg
setup.py		setup.py
test_sbatch.sh		test_sbatch.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐉 HydraMPP

Lightweight, many-headed parallel processing — from your laptop to the whole cluster.

⚡ Ray-like API • 🧵 multiprocessing core • 🌐 Multi-node • 🧮 Native SLURM • 🪶 Featherweight

🔬 What is HydraMPP?

✨ Features

🧵 Parallel Execution

🌐 Distributed Computing

⚡ Why HydraMPP?

🧱 Architecture

🐉 Tech Stack

🚀 Installation

1️⃣ From PyPI

2️⃣ From conda-forge

3️⃣ From source

⚡ Quick Start

🌐 Running Modes

📚 API Reference

📋 `get()` result record

📡 Status Monitor

🧮 SLURM Integration

📄 License

📖 Citation

🤝 Contributing

📞 Support

🐉 HydraMPP

Lightweight. Distributed. Parallel.

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐉 HydraMPP

Lightweight, many-headed parallel processing — from your laptop to the whole cluster.

⚡ Ray-like API • 🧵 multiprocessing core • 🌐 Multi-node • 🧮 Native SLURM • 🪶 Featherweight

🔬 What is HydraMPP?

✨ Features

🧵 Parallel Execution

🌐 Distributed Computing

⚡ Why HydraMPP?

🧱 Architecture

🐉 Tech Stack

🚀 Installation

1️⃣ From PyPI

2️⃣ From conda-forge

3️⃣ From source

⚡ Quick Start

🌐 Running Modes

📚 API Reference

📋 get() result record

📡 Status Monitor

🧮 SLURM Integration

📄 License

📖 Citation

🤝 Contributing

📞 Support

🐉 HydraMPP

Lightweight. Distributed. Parallel.

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

📋 `get()` result record

Packages