Skip to content

Kurama622/PicoPebble

Repository files navigation

ENGLISH | 中文版

Introduction

PicoPebble is a lightweight distributed machine learning training framework for beginners. It uses MPI to pass parameters and update gradients between multiple machines, and it also allows for training on a single machine. The features currently supported by PicoPebble include:

  • Synchronous training
  • Asynchronous training
  • Data parallelism
  • Pipeline model parallelism

There are also several features in the development pipeline:

  • Tensor model parallelism
  • Passing parameters through Gloo
  • Disaster recovery

Dependency

Currently, PicoPebble relies on MPI for parameter synchronization, so you need to install OpenMPI. Please note that you should not install both OpenMPI and MPICH at the same time.

Centos 8

sudo yum install openmpi-devel -y

Ubuntu

sudo apt install openmpi-bin libopenmpi-dev

Archlinux

sudo pacman -S openmpi

Docker

docker build -t picopebble -f Dockerfile .

# for podman
# podman build -t picopebble -f Dockerfile .`

Build && run

single-node or single-machine

# ./build_run.sh <node num>
./build_run.sh 1

multi-node

./build_run.sh 3

Reference

About

A lightweight distributed machine learning training framework for beginners

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages