Run many jobs on multiple GPUs, so I can enjoy my weekends while GPUs work :P
Currently, we do a Producer-Consumer pattern where the producer reads a text file of commands. One command per line and feeds into a queue. There are atleast n consumer processes, one per GPU which read from the queue and run the command on the corresponding GPU. For each job, the STDOUT and STDERR outputs are stored in a folder with the job_id.
Currently, we support only static number of GPUs
Currently, hog is not hosted on PyPI (hopefully will be done soon).
Alternative would be to clone this repo, add it to it your PATH variable and call from there.
We currently support running hog as a CLI command
A basic use case would be
hog --job_file foo.txt --gpus 1,2,3
This would read jobs from foo.txt and run them on GPUs 1, 2 and 3 concurrently.
See the section on Format of Job File for more information on how to write the job file.
job_filefile to read jobs fromjob_yielderfile withyieldermethod that generates jobs programmaticallygpuscomma-separated IDs of GPUs to use. Can run more than one concurrent per GPUoutput_dirDirectory to store outputs from runs. Defaults tohog_runprefixprefix to attach to each per-job folder name. Defaults tojob_so you will have folders namedjob_0,job_1... underoutput_dir
We have the --job_yielder flag that allows users to define their method to generate jobs instead of using a job_file. To use this, define a method named yielder in another file, say test.py and call hog as below
hog --job_yielder test.py ...other flags
hog will now run the yielder method from test.py to generate the jobs to put into queue.
We do not have restrictions on how many concurrent jobs can run on the same GPU. It is important to note that in some cases it might be better to run only one job at any given point on a GPU. In other cases, for example, running multiple tensorflow instances, it might be possible to run several concurrent sessions on the same GPU. It is up to the user to decide which one is better suited for their use-case.
To run multiple concurrent programs on the same GPU, use multiple instances of the ID while setting the --gpus flag.
For example, --gpus 0,0,1,2,2,2 will run two concurrent jobs on GPU 0, one on GPU 1, three on GPU 2
Inside output_dir, there is one folder per job according to flags passed. Say we have job_1, inside we have the following files
INFObasic information about the job such as job name, command, GPU the command was run onjob_0.ERRcapturedSTDERRoutput of the jobjob_0.OUTcapturedSTDOUToutput of the jobSUCCESS/FAILUREempty file showing whether the job succeeded or not
- A job is a bash command or
&&separated sequence of commands to be executed - Specify one job per line
- Lines starting with
#and empty lines are ignored
- Incorporate using
hogas a decorator to make it more flexible - Allow users to override default task to be done for each job
- Hooks, both pre-run and post-run (for things like email alerts, logging to a DB .etc)
- Use
multiprocessing.logginginstead ofprintstatements - Allow changing GPUs available at runtime through a
gpu_fileargument - Have a test suite