0% found this document useful (0 votes)

160 views8 pages

HPC Job Submission Guide

The document provides instructions for submitting jobs to the HPC cluster named Surya at Indian Institute of Information Technology, Allahabad. It covers logging into the cluster, writing job scripts, submitting jobs using qsub, checking job status with qstat, and deleting jobs with qdel. Sample job scripts are provided for MPI and Tensorflow jobs. The cluster uses PBSPro as its job scheduler and has different queues for varying resources and time limits.

Uploaded by

RUTUJA MADHURE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

160 views8 pages

HPC Job Submission Guide

Uploaded by

RUTUJA MADHURE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

HPC Cluster Job submission

Indian Institute of Information

Technology, Allahabad

Netweb Technologies India Pvt. Ltd.

Plot No-H1, Pocket- 9,
Faridabad Industrial Town (FIT)
Sector- 57, Faridabad, Ballabgarh,
State – Haryana- 121004, India
Table of Contents

1. Login to HPC (named Surya) cluster

2. Job Submission
3. Sample job scripts
a. General script structure
b. MPI jobs
c. Tensorflow
4. Job scheduler commands
1. Login to HPC (named Surya) cluster

(a) Login to Surya cluster with your username and password using following two ways:

i. ssh username@surya.iiita.ac.in from your terminal

ii. Use putty and enter surya.iiita.ac.in(or 172.20.70.12)

Note: If you are using surya.iiita.ac.in in place of ip-address for login then your primary DNS
should be 172.31.1.21 (IIITA DNS Server IP address)

Enter your username and password

2.
job

The scheduler used to schedule jobs on cluster is PBSPRO

2. Job submission

Write the job scheduler script (shell script, sample given in Section 3), load the required modules
and submit the job.

#load the module (like anaconda module if you want to use tensorflow) by using below steps

(a) Check available module ($ is your command prompt and should not be written)
$ module avail

(b) Load the module

$ module load <module_name>

(c) Check the loaded modules with below command

$ module list

and submit the job with qsub -V <script_name>

example:

$ qsub -V job1.sh
(d) Available queues in the cluster (Testing phase only)

(1) prerunl : unlimited time with 160 cores

(2) preruns : 4 hours wall time with 160 cores
(3) prerungl : unlimited time with 2 gpus with 40 cores or 1 gpu with 20 cores
(4) prerungs : 4 hours time with 1 gpu with 20 cores and 190Gb memory

3. Sample Job Scripts

(a). General script structure, save the script as <NameOfScript>.sh for example job1.sh

#!/bin/bash
##name of the job
#PBS -N jobname
##job output log
#PBS -o out.log
##job/application error logs
#PBS -e error.log
##requesting number of nodes and resources
#PBS -l nodes=4:ppn=40
##selecting queue
# PBS -q preruns
cd $PBS_O_WORKDIR
#job command without hash

Note: Lines above with two hash symbols (##) are comments and one hash symbol (#) are actual
commands. #PBS commands are used to set properties of the job for example, #PBS -l
nodes=4:ppn=40, request the scheduler to assign 4 nodes and 40 processors per node (ppn) to the
job.
=====================================================================
command to submit the job
$qsub -V job1.sh

(b) MPI jobs

#!/bin/bash
#PBS -N cpi
#PBS -o out.log
#PBS -e error.log
#PBS -l nodes=4:ppn=40
#PBS -q preruns
cd $PBS_O_WORKDIR
mpiexec.hydra -genv -machinefile $PBS_NODEFILE -np 160 ./a.out

Note: ./a.out is the executable code. In case you have your mpi c code for ex. Helloworld.c (shown
below), then you should first compile your code with mpicc to get an executable and then use your
executable at end of mpiexec.hydra command above.

//Helloworld.c (using mpi)

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {

// Initialize the MPI environment
MPI_Init(NULL, NULL);

// Get the number of processes

int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);

// Get the rank of the process

int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

// Get the name of the processor

char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);

// Print off a hello world message

printf("Hello world from processor %s, rank %d out of %d processors\n",
processor_name, world_rank, world_size);

// Finalize the MPI environment.

MPI_Finalize();
}
---------------------------------------------------------------------------------------------------------------------

(c) Tensorflow jobs

step1. Load the anaconda module by using below command

$ module load utils/anaconda3.5

step2: write your tensorflow program, basic hello Tensorflow (tensortest.py) is given below:

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

step3: write the job script (tensor_jobscipt.sh) as show below (modify the name of the file
accordingly)
#!/bin/bash
#PBS -N tensorflow
#PBS -l nodes=1:ppn=1
#PBS -o outlog
#PBS -e errorlog
cd $PBS_O_WORKDIR
python tensortest.py

step4: submit the job using the following command

$qsub -V tensor_jobscipt.sh

#your output will be in outlog file.

Screen shot of the running job is shown below

4. JOB scheduler commands.

(a) Submit the job to the scheduler by $qsub -V <script_name>

(b) Check the jobs status by $qstat

(c) check where the job are running $ qstat -n

(d) check full information of the job $ qstat -f <job_id>

...part of the information only shown in screenshot

(e) Delete the job from the queue $qdel <job_id> (it may take 5 to 10 seconds)

(f) Check the queue information $qstat -q

HPC Basics for New Users
No ratings yet
HPC Basics for New Users
77 pages
Doing More With Slurm Advanced Capabilities
No ratings yet
Doing More With Slurm Advanced Capabilities
31 pages
Install - Guide CentOS7 xCAT Stateful SLURM 1.3.9 x86 - 64
No ratings yet
Install - Guide CentOS7 xCAT Stateful SLURM 1.3.9 x86 - 64
57 pages
Infinibad Cheat Sheet
No ratings yet
Infinibad Cheat Sheet
2 pages
Divy HPC
No ratings yet
Divy HPC
36 pages
Cluster Admin Guide
No ratings yet
Cluster Admin Guide
41 pages
NVIDIA Base Command Manager 10 Installation Manual
No ratings yet
NVIDIA Base Command Manager 10 Installation Manual
129 pages
OpenStack Guide for IT Professionals
No ratings yet
OpenStack Guide for IT Professionals
46 pages
Part II - NVIDIA Mellanox Bluefield-2 SmartNIC Hands-On Tutorial Levente Csikor CodeX
No ratings yet
Part II - NVIDIA Mellanox Bluefield-2 SmartNIC Hands-On Tutorial Levente Csikor CodeX
40 pages
Configure HAProxy for EFT HA Cluster
No ratings yet
Configure HAProxy for EFT HA Cluster
8 pages
Clustering Software Overview
No ratings yet
Clustering Software Overview
27 pages
PARAM Siddhi-AI System Manual Ver1.0
No ratings yet
PARAM Siddhi-AI System Manual Ver1.0
88 pages
Openshift Installation Steps
No ratings yet
Openshift Installation Steps
18 pages
Open Shift
No ratings yet
Open Shift
41 pages
Configuration For High Availability (HA) Best Practices Guide
No ratings yet
Configuration For High Availability (HA) Best Practices Guide
19 pages
User Manual
No ratings yet
User Manual
116 pages
Dell NSS NFS Storage Solution Final PDF
No ratings yet
Dell NSS NFS Storage Solution Final PDF
38 pages
Artifactory-Configuring With Jenkins
No ratings yet
Artifactory-Configuring With Jenkins
9 pages
Red Hat Gluster Storage 3.1 Deployment Guide For Containerized Red Hat Gluster Storage in Openshift Enterprise
No ratings yet
Red Hat Gluster Storage 3.1 Deployment Guide For Containerized Red Hat Gluster Storage in Openshift Enterprise
40 pages
Nova HA
100% (1)
Nova HA
22 pages
Red Hat Openshift Container Storage 4.6: Planning Your Deployment
No ratings yet
Red Hat Openshift Container Storage 4.6: Planning Your Deployment
25 pages
Install xrdp on CentOS/RHEL 7 Guide
No ratings yet
Install xrdp on CentOS/RHEL 7 Guide
5 pages
Install HAproxy
No ratings yet
Install HAproxy
2 pages
Lustre Quick Cheatsheet
No ratings yet
Lustre Quick Cheatsheet
4 pages
How To Configure SSH PDF
No ratings yet
How To Configure SSH PDF
6 pages
Btech Trainings Guide
No ratings yet
Btech Trainings Guide
26 pages
Ceph Cluster Setup with Old PCs
100% (1)
Ceph Cluster Setup with Old PCs
7 pages
Install Guide: Openstack Contributors
No ratings yet
Install Guide: Openstack Contributors
131 pages
Six Stages of Linux Boot Process (Start-Up Sequence)
No ratings yet
Six Stages of Linux Boot Process (Start-Up Sequence)
3 pages
Process Management Interface - Exascale
No ratings yet
Process Management Interface - Exascale
19 pages
Docker and Kubernetes
No ratings yet
Docker and Kubernetes
12 pages
Virtual Machine Block Storage With The Distributed Storage System
No ratings yet
Virtual Machine Block Storage With The Distributed Storage System
40 pages
Configure VNC Server On CentOS 7
No ratings yet
Configure VNC Server On CentOS 7
4 pages
HP Moonshot Provisiong Manager User Guide
No ratings yet
HP Moonshot Provisiong Manager User Guide
44 pages
Multi-Node OpenStackInstallation Guide Part-I
No ratings yet
Multi-Node OpenStackInstallation Guide Part-I
9 pages
Image Guide of Openstack
No ratings yet
Image Guide of Openstack
101 pages
Solution Methodology2
No ratings yet
Solution Methodology2
3 pages
Openstack Install Guide Yum Kilo
100% (1)
Openstack Install Guide Yum Kilo
178 pages
Deploying Ceph Storage Cluster + Calamari (For Ubuntu Server 16.04 LTS)
No ratings yet
Deploying Ceph Storage Cluster + Calamari (For Ubuntu Server 16.04 LTS)
55 pages
Complete PfSense Guide For Installation and Network Setup
No ratings yet
Complete PfSense Guide For Installation and Network Setup
15 pages
Mellanox Openstack Solution
No ratings yet
Mellanox Openstack Solution
24 pages
EBPF-kernel Tracing With
No ratings yet
EBPF-kernel Tracing With
220 pages
Cloud Computing Lab - Manual
No ratings yet
Cloud Computing Lab - Manual
30 pages
Red Hat Virtualization-4.4-Installing Red Hat Virtualization As A Standalone Manager With Remote Databases-En-Us
No ratings yet
Red Hat Virtualization-4.4-Installing Red Hat Virtualization As A Standalone Manager With Remote Databases-En-Us
87 pages
Veritas Cluster 2.0
No ratings yet
Veritas Cluster 2.0
383 pages
Zenarmor Beginners Guide To Opnsense 2024
No ratings yet
Zenarmor Beginners Guide To Opnsense 2024
253 pages
Tomcatx Performance Tuning
No ratings yet
Tomcatx Performance Tuning
51 pages
Linux Rsync Command Guide
No ratings yet
Linux Rsync Command Guide
4 pages
qm9700 qm9790 1u NDR 400gb S Infiniband Switch Systems User Manual
No ratings yet
qm9700 qm9790 1u NDR 400gb S Infiniband Switch Systems User Manual
102 pages
Microsoft Official Course: Implementing Failover Clustering With Hyper-V
No ratings yet
Microsoft Official Course: Implementing Failover Clustering With Hyper-V
31 pages
How To Configure Two Node High Availability Cluster On RHEL
No ratings yet
How To Configure Two Node High Availability Cluster On RHEL
18 pages
Drbd9 Mysql Rhel8
No ratings yet
Drbd9 Mysql Rhel8
23 pages
Proxmox Networking
No ratings yet
Proxmox Networking
17 pages
Red Hat Puppet Guide
No ratings yet
Red Hat Puppet Guide
32 pages
Red Hat Enterprise Linux-8-8.4 Release Notes-En-Us
No ratings yet
Red Hat Enterprise Linux-8-8.4 Release Notes-En-Us
172 pages
Vsan Ready Node
No ratings yet
Vsan Ready Node
576 pages
Rhev Troubleshooting Summit2012 DK 1
100% (1)
Rhev Troubleshooting Summit2012 DK 1
38 pages
HPC User Manual-Updated
No ratings yet
HPC User Manual-Updated
4 pages
HPC 2013 Cluster User Guide
No ratings yet
HPC 2013 Cluster User Guide
4 pages
HPC Cluster Guide for Physics Students
No ratings yet
HPC Cluster Guide for Physics Students
8 pages
Kotlin-Derek Banas
No ratings yet
Kotlin-Derek Banas
168 pages
Experiment 1: AIM: To Determine Fourier Series of A Periodic Signal Using MATLAB. Requirements: Matlab Lab Tasks
No ratings yet
Experiment 1: AIM: To Determine Fourier Series of A Periodic Signal Using MATLAB. Requirements: Matlab Lab Tasks
2 pages
Experiment 9: The DTFT of A Sequence X (N) Is Defined by Following
No ratings yet
Experiment 9: The DTFT of A Sequence X (N) Is Defined by Following
7 pages
Iconclave
No ratings yet
Iconclave
2 pages
Lab - Exp - 1 and 2
No ratings yet
Lab - Exp - 1 and 2
5 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
2 pages
Tutorial 1 PDF
100% (1)
Tutorial 1 PDF
2 pages
I8086 Instruction Set-With Examples
No ratings yet
I8086 Instruction Set-With Examples
53 pages
Files, File Handling Functions and Programs Dr. Srivastav Sir
No ratings yet
Files, File Handling Functions and Programs Dr. Srivastav Sir
12 pages
What Is An Interrobang
No ratings yet
What Is An Interrobang
6 pages
Jio Airfiber 9.3.25
No ratings yet
Jio Airfiber 9.3.25
7 pages
CEFI Adult Brochure 2022
No ratings yet
CEFI Adult Brochure 2022
6 pages
Yokogawa DX2000 Manual PDF
No ratings yet
Yokogawa DX2000 Manual PDF
324 pages
Grease Programme
No ratings yet
Grease Programme
2 pages
Men's Health South Africa - November 2019
100% (1)
Men's Health South Africa - November 2019
134 pages
Cisco Certifications - Wikipedia
No ratings yet
Cisco Certifications - Wikipedia
1 page
HACCP Program
No ratings yet
HACCP Program
28 pages
Employer's Annual Federal Unemployment (FUTA) Tax Return
No ratings yet
Employer's Annual Federal Unemployment (FUTA) Tax Return
4 pages
Group 5 Principles of Management
No ratings yet
Group 5 Principles of Management
18 pages
Should Shouldnt Grammar Drills Tests Warmers Coolers 17177
No ratings yet
Should Shouldnt Grammar Drills Tests Warmers Coolers 17177
1 page
Astrology Insights for Beginners
No ratings yet
Astrology Insights for Beginners
13 pages
BMOL20090 Gene Expression in Eukaryotes 1
No ratings yet
BMOL20090 Gene Expression in Eukaryotes 1
39 pages
Technology Globalization and Ethics
No ratings yet
Technology Globalization and Ethics
30 pages
Patent Dispute: Manzano vs. Madolaria
No ratings yet
Patent Dispute: Manzano vs. Madolaria
7 pages
Chapter 2 Notes
100% (1)
Chapter 2 Notes
19 pages
WEF The Net Zero Challenge
100% (1)
WEF The Net Zero Challenge
41 pages
Transportation Data Management Insights
No ratings yet
Transportation Data Management Insights
24 pages
Cardiac-Tamponade With Highlights
No ratings yet
Cardiac-Tamponade With Highlights
4 pages
BI - Sulfuric Acid 98
No ratings yet
BI - Sulfuric Acid 98
2 pages
Terminological Card
No ratings yet
Terminological Card
3 pages
Laboratory #4: Control Charts For Variable Data (X-Bar and R) Purpose: Materials
No ratings yet
Laboratory #4: Control Charts For Variable Data (X-Bar and R) Purpose: Materials
7 pages
Sleeping Beauty Landscape Book CKF FKB
No ratings yet
Sleeping Beauty Landscape Book CKF FKB
30 pages
End Term Assignment Report On Nykaa
No ratings yet
End Term Assignment Report On Nykaa
41 pages
Creating and Opening Presentations
No ratings yet
Creating and Opening Presentations
26 pages
Isometric Projections
100% (1)
Isometric Projections
20 pages
Ss5150c Gme
No ratings yet
Ss5150c Gme
3 pages
Prokon Course Outline 2 2021
No ratings yet
Prokon Course Outline 2 2021
3 pages
Environmental Movements in India
No ratings yet
Environmental Movements in India
2 pages
Module 1 - Technical Terms in Research - Quarter 4
No ratings yet
Module 1 - Technical Terms in Research - Quarter 4
48 pages

HPC Job Submission Guide

Uploaded by

HPC Job Submission Guide

Uploaded by

HPC Cluster Job submission

Indian Institute of Information

Netweb Technologies India Pvt. Ltd.

1. Login to HPC (named Surya) cluster

i. ssh username@surya.iiita.ac.in from your terminal

Enter your username and password

The scheduler used to schedule jobs on cluster is PBSPRO

(b) Load the module

(c) Check the loaded modules with below command

and submit the job with qsub -V <script_name>

(1) prerunl : unlimited time with 160 cores

3. Sample Job Scripts

(b) MPI jobs

//Helloworld.c (using mpi)

int main(int argc, char** argv) {

// Get the number of processes

// Get the rank of the process

// Get the name of the processor

// Print off a hello world message

// Finalize the MPI environment.

(c) Tensorflow jobs

step1. Load the anaconda module by using below command

step4: submit the job using the following command

#your output will be in outlog file.

Screen shot of the running job is shown below

4. JOB scheduler commands.

(a) Submit the job to the scheduler by $qsub -V <script_name>

(b) Check the jobs status by $qstat

(c) check where the job are running $ qstat -n

...part of the information only shown in screenshot

(f) Check the queue information $qstat -q

You might also like