0% found this document useful (0 votes)
160 views8 pages

HPC Job Submission Guide

The document provides instructions for submitting jobs to the HPC cluster named Surya at Indian Institute of Information Technology, Allahabad. It covers logging into the cluster, writing job scripts, submitting jobs using qsub, checking job status with qstat, and deleting jobs with qdel. Sample job scripts are provided for MPI and Tensorflow jobs. The cluster uses PBSPro as its job scheduler and has different queues for varying resources and time limits.

Uploaded by

RUTUJA MADHURE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views8 pages

HPC Job Submission Guide

The document provides instructions for submitting jobs to the HPC cluster named Surya at Indian Institute of Information Technology, Allahabad. It covers logging into the cluster, writing job scripts, submitting jobs using qsub, checking job status with qstat, and deleting jobs with qdel. Sample job scripts are provided for MPI and Tensorflow jobs. The cluster uses PBSPro as its job scheduler and has different queues for varying resources and time limits.

Uploaded by

RUTUJA MADHURE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

HPC Cluster Job submission

Indian Institute of Information


Technology, Allahabad

By

Netweb Technologies India Pvt. Ltd.


Plot No-H1, Pocket- 9,
Faridabad Industrial Town (FIT)
Sector- 57, Faridabad, Ballabgarh,
State – Haryana- 121004, India
Table of Contents

1. Login to HPC (named Surya) cluster


2. Job Submission
3. Sample job scripts
a. General script structure
b. MPI jobs
c. Tensorflow
4. Job scheduler commands
1. Login to HPC (named Surya) cluster

(a) Login to Surya cluster with your username and password using following two ways:

i. ssh username@surya.iiita.ac.in from your terminal


ii. Use putty and enter surya.iiita.ac.in(or 172.20.70.12)

Note: If you are using surya.iiita.ac.in in place of ip-address for login then your primary DNS
should be 172.31.1.21 (IIITA DNS Server IP address)

Enter your username and password

2.
job

The scheduler used to schedule jobs on cluster is PBSPRO


2. Job submission

Write the job scheduler script (shell script, sample given in Section 3), load the required modules
and submit the job.

#load the module (like anaconda module if you want to use tensorflow) by using below steps

(a) Check available module ($ is your command prompt and should not be written)
$ module avail

(b) Load the module


$ module load <module_name>

(c) Check the loaded modules with below command


$ module list

and submit the job with qsub -V <script_name>

example:

$ qsub -V job1.sh
(d) Available queues in the cluster (Testing phase only)

(1) prerunl : unlimited time with 160 cores


(2) preruns : 4 hours wall time with 160 cores
(3) prerungl : unlimited time with 2 gpus with 40 cores or 1 gpu with 20 cores
(4) prerungs : 4 hours time with 1 gpu with 20 cores and 190Gb memory

3. Sample Job Scripts

(a). General script structure, save the script as <NameOfScript>.sh for example job1.sh

#!/bin/bash
##name of the job
#PBS -N jobname
##job output log
#PBS -o out.log
##job/application error logs
#PBS -e error.log
##requesting number of nodes and resources
#PBS -l nodes=4:ppn=40
##selecting queue
# PBS -q preruns
cd $PBS_O_WORKDIR
#job command without hash

Note: Lines above with two hash symbols (##) are comments and one hash symbol (#) are actual
commands. #PBS commands are used to set properties of the job for example, #PBS -l
nodes=4:ppn=40, request the scheduler to assign 4 nodes and 40 processors per node (ppn) to the
job.
=====================================================================
command to submit the job
$qsub -V job1.sh

(b) MPI jobs

#!/bin/bash
#PBS -N cpi
#PBS -o out.log
#PBS -e error.log
#PBS -l nodes=4:ppn=40
#PBS -q preruns
cd $PBS_O_WORKDIR
mpiexec.hydra -genv -machinefile $PBS_NODEFILE -np 160 ./a.out

Note: ./a.out is the executable code. In case you have your mpi c code for ex. Helloworld.c (shown
below), then you should first compile your code with mpicc to get an executable and then use your
executable at end of mpiexec.hydra command above.

//Helloworld.c (using mpi)

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {


// Initialize the MPI environment
MPI_Init(NULL, NULL);

// Get the number of processes


int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);

// Get the rank of the process


int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

// Get the name of the processor


char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);

// Print off a hello world message


printf("Hello world from processor %s, rank %d out of %d processors\n",
processor_name, world_rank, world_size);

// Finalize the MPI environment.


MPI_Finalize();
}
---------------------------------------------------------------------------------------------------------------------

(c) Tensorflow jobs

step1. Load the anaconda module by using below command


$ module load utils/anaconda3.5

step2: write your tensorflow program, basic hello Tensorflow (tensortest.py) is given below:

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

step3: write the job script (tensor_jobscipt.sh) as show below (modify the name of the file
accordingly)
#!/bin/bash
#PBS -N tensorflow
#PBS -l nodes=1:ppn=1
#PBS -o outlog
#PBS -e errorlog
cd $PBS_O_WORKDIR
python tensortest.py

step4: submit the job using the following command

$qsub -V tensor_jobscipt.sh

#your output will be in outlog file.

Screen shot of the running job is shown below

4. JOB scheduler commands.

(a) Submit the job to the scheduler by $qsub -V <script_name>

(b) Check the jobs status by $qstat

(c) check where the job are running $ qstat -n


(d) check full information of the job $ qstat -f <job_id>

...part of the information only shown in screenshot

(e) Delete the job from the queue $qdel <job_id> (it may take 5 to 10 seconds)

(f) Check the queue information $qstat -q

You might also like