Installation Manual
Installation Manual
Installation Manual
Revision: f502c1099
Date: Wed May 28 2025
©2024 NVIDIA Corporation & affiliates. All Rights Reserved. This manual or parts thereof may not be
reproduced in any form unless permitted by contract or by written permission of NVIDIA Corporation.
Trademarks
Linux is a registered trademark of Linus Torvalds. PathScale is a registered trademark of Cray, Inc.
Red Hat and all Red Hat-based trademarks are trademarks or registered trademarks of Red Hat, Inc.
SUSE is a registered trademark of SUSE LLC. NVIDIA, CUDA, GPUDirect, HPC SDK, NVIDIA DGX,
NVIDIA Nsight, and NVLink are registered trademarks of NVIDIA Corporation. FLEXlm is a registered
trademark of Flexera Software, Inc. PBS Professional, and Green Provisioning are trademarks of Altair
Engineering, Inc. All other trademarks are the property of their respective owners.
2 Introduction 11
2.1 What Is NVIDIA Bright Cluster Manager? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 What OS Platforms Is It Available For? . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 What Architectures Does It Run On? . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 What Features Are Supported Per OS And Architecture? . . . . . . . . . . . . . . . 11
2.1.4 What OS Platforms Can It Be Managed From? . . . . . . . . . . . . . . . . . . . . . 12
2.2 Cluster Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
8 Burning Nodes 93
8.1 Test Scripts Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.2 Burn Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.2.1 Mail Tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.2.2 Pre-install And Post-install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.2.3 Post-burn Install Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.2.4 Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.2.5 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.3 Running A Burn Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.3.1 Burn Configuration And Execution In cmsh . . . . . . . . . . . . . . . . . . . . . . . 95
8.3.2 Writing A Test Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.3.3 Burn Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.4 Relocating The Burn Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.4.1 Configuring The Relocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
iv Table of Contents
• If the cluster has already been installed, tested, and configured, but only needs to be configured
now for a new network, then the administrator should only need to look at Chapter 6. Chapter 6
lays out how to carry out the most common configuration changes that usually need to be done to
make the cluster work in the new network.
• For administrators that are very unfamiliar with clusters, reading the introduction (Chapter 2)
and then the more detailed installation walkthrough for a bare metal installation (Chapter 3, sec-
tions 3.1, 3.2, and 3.3) is recommended. Having carried out the head node installation, the ad-
ministrator can then return to this quickstart chapter (Chapter 1), and continue onward with the
quickstart process of regular node installation (section 1.3).
• The configuration and administration of the cluster after it has been installed is covered in the
cluster manager Administrator Manual. The Administrator Manual should be consulted for further
background information as well as guidance on cluster administration tasks, after the introduction
(Chapter 2) of the Installation Manual has been read.
1. The BIOS of the head node should have the local time set.
2. The head node should be booted from the cluster manager DVD.
3. The option: Install NVIDIA Bright Cluster Manager (Graphical), or Install NVIDIA
Bright Cluster Manager (Text mode), should be selected in the text boot menu. The Graphical
2 Quickstart Installation Guide
installation is recommended, and brings up the GUI installation Welcome screen. The Text mode
installation provides a minimal, ncurses-based version, of the GUI installation.
Only the GUI installation is discussed in the rest of this quickstart for convenience.
• At the Bright Computing Software License screen, the acceptance checkbox should be
ticked. Next should then be ticked.
• At the Linux base distribution screen, the acceptance checkbox should be ticked. Next should
then be clicked.
7. At the Hardware Info screen, the detected hardware should be reviewed. If additional kernel
modules are required, then the administrator should go back to the Kernel Modules screen. Once
all the relevant hardware (Ethernet interfaces, hard drive and DVD drive) is detected, Next should
be clicked.
8. At the Installation source screen, the DVD drive containing the cluster manager DVD should
be selected, then Next clicked.
9. At the General cluster settings screen, one or more nameservers and one or more domains
can be set, if they have not already been automatically filled. The remaining settings can usually
be left as is.
10. At the Workload management screen, an HPC workload manager can be selected. The choice can
be made later on too, after the cluster manager has been installed.
11. For the Network topology screen, a Type 1 network is the most common.
12. For the Head node settings screen, the head is given a name and a password.
13. For the Compute nodes settings screen, the head is given a name and a password.
14. For the BMC configuration screen, the use of IPMI/iLO/DRAC/CIMC/Redfish BMCs is
carried out. Adding an IPMI/iLO/DRAC/CIMC/Redfish network is needed to configure
IPMI/iLO/DRAC/CIMC/Redfish interfaces in a different IP subnet, and is recommended.
15. At the Networks screen, the network parameters for the head node should be entered for the inter-
face facing the network named externalnet:
• If using DHCP on that interface, the parameters for IP Address, Netmask and Gateway as
suggested by the DHCP server on the external network can be accepted.
• If not using DHCP on that interface, static values put in instead.
The network parameters for externalnet that can be set include the:
1.1 Installing The Head Node 3
The network externalnet corresponds to the site network that the cluster resides in (for example,
a corporate or campus network). The IP address details are therefore the details of the head node
for a type 1 externalnet network (figure 3.11). A domain name should be entered to suit the local
requirements.
16. For the Head node interfaces screen, the head node network interfaces are assigned networks
and IP addresses. The assigned values can be reviewed and changed.
17. At the Compute node interfaces screen, the compute node interfaces are assigned networks and
IP addresses. The assigned values can be reviewed and changed.
18. At the Disk Partitioning and Layouts screen, a drive should be selected for the head node. The
installation will be done onto this drive, overwriting all its previous content.
The administrator can modify the disk layout for the head node by selecting a pre-defined layout.
For hard drives that have less than about 500GB space, the XML file
master-one-big-partition.xml is used by default:
For hard drives that have about 500GB or more of space, the XML file master-standard.xml is
used by default:
The layouts indicated by these tables may be fine-tuned by editing the XML partitioning definition
during this stage. The “max” setting in the XML file corresponds to the “rest” entry in these tables,
and means the rest of the drive space is used up for the associated partition, whatever the leftover
space is.
There are also other layout templates available from a menu.
4 Quickstart Installation Guide
19. At the Additional software screen, extra software options can be chosen for installation if these
were selected for the installation ISO. The extra software options are:
• CUDA
• Ceph
• OFED stack
20. The Summary screen should be reviewed. A wrong entry can still be fixed at this point. The Next
button then starts the installation.
21. The Deployment screen should eventually complete. Clicking on Reboot reboots the head node.
2. Once the machine is fully booted, a login should be done as root with the password that was
entered during installation.
3. A check should be done to confirm that the machine is visible on the external network. Also, it
should be checked that the second NIC (i.e. eth1) is physically connected to the external network.
4. If the parent distribution for the cluster manager is RHEL and SUSE then registration (Chapter 5)
should usually be done.
6. The head node software should be updated via its package manager (yum, dnf, apt, zypper) so
that it has the latest packages (sections 11.2 -11.3. of the Administrator Manual)
2. The BIOS of regular nodes should be configured to boot from the network. The regular nodes
should then be booted. No operating system is expected to be on the regular nodes already. If
there is an operating system there already, then by default, it is overwritten by a default image
provided by the head node during the next stages.
3. If everything goes well, the node-installer component starts on each regular node and a certificate
request is sent to the head node.
If a regular node does not make it to the node-installer stage, then it is possible that additional
kernel modules are needed. Section 5.8 of the Administrator Manual contains more information on
how to diagnose problems during the regular node booting process.
4. To identify the regular nodes (that is, to assign a host name to each physical node), several options
are available. Which option is most convenient depends mostly on the number of regular nodes
and whether a (configured) managed Ethernet switch is present.
1.3 Booting Regular Nodes 5
Rather than identifying nodes based on their MAC address, it is often beneficial (especially in
larger clusters) to identify nodes based on the Ethernet switch port that they are connected to. To
allow nodes to be identified based on Ethernet switch ports, section 3.8 of the Administrator Manual
should be consulted.
If a node is unidentified, then its node console displays an ncurses message to indicate it is an un-
known node, and the net boot keeps retrying its identification attempts. Any one of the following
methods may be used to assign node identities when nodes start up as unidentified nodes:
a. Identifying each node on the node console: To manually identify each node, the “Manually
select node” option is selected for each node. The node is then identified manually by se-
lecting a node-entry from the list, choosing the Accept option. This option is easiest when
there are not many nodes. It requires being able to view the console of each node and key-
board entry to the console.
b. Identifying nodes using cmsh: In cmsh the newnodes command in device mode (page 230,
section 5.4.2 of the Administrator Manual) can be used to assign identities to nodes from the
command line. When called without parameters, the newnodes command can be used to
verify that all nodes have booted into the node-installer and are all waiting to be assigned an
identity.
c. Identifying nodes using Bright View: The node identification resource (page 234, section 5.4.2
of the Administrator Manual) in Bright View automates the process of assigning identities so
that manual identification of nodes at the console is not required.
Example
To verify that all regular nodes have booted into the node-installer:
Example
Once all regular nodes have been booted in the proper order, the order of their appearance on the
network can be used to assign node identities. To assign identities node001 through node032 to
the first 32 nodes that were booted, the following commands may be used:
5. Each regular node is now provisioned and eventually fully boots. In case of problems, section 5.8
of the Administrator Manual should be consulted.
6. Optional: To configure power management, Chapter 4 of the Administrator Manual should be con-
sulted.
7. To update the software on the nodes, a package manager is used to install to the node image
filesystem that is on the head node.
The node image filesystem should be updated via its package manager (yum, dnf, apt, zypper) so
that it has the latest packages (sections 11.4 -11.5 of the Administrator Manual).
(a) NVIDIA GPU hardware should be detected on the nodes that use it. This is true for NVIDIA
GPU units (separate from the nodes) as well as for on-board NVIDIA GPUs. The lspci
command can be used for detection. For example, for a GPU used by node001:
Example
[root@bright92 ~]# ssh node001 lspci | grep NVIDIA
00:07.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40c] (rev a1)
(b) AMD CPUs, which have a GPU integrated with the CPU, the CPU chip can similarly be
identified with lscpu:
Example
[root@bright92 ~]# ssh node001 lscpu | grep "Model name:"
Model name: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
The AMD chips can then be checked against the list of AMD chips with AMD GPUs, as listed
at https://www.amd.com/en/support/kb/release-notes/rn-prorad-lin-18-20
(a) Details of AMD GPU software installation are given in section 7.5.
(b) For NVIDIA GPUs, assuming the GPU is on the regular node node001, and that the hardware
is supported by CUDA 11.7, then software installation is carried out at the head node as
follows:
i. The software components are installed for the head node itself with:
[root@bright92 ~]# yum install cuda11.7-toolkit cuda11.7-sdk
ii. Components are installed into the image used by the nodes that have the GPUs, for ex-
ample the image default-image, with:
[root@bright92 ~]# yum --installroot=/cm/images/default-image install cuda-driver cuda-dcgm
1.4 Quickstart For GPUs 7
iii. The nodes with GPUs can then simply be rebooted to compile the CUDA drivers as the
node boots, and to start the CUDA driver up:
[root@bright92 ~]# cmsh -c 'device; reboot -n node001..node015'
Further details on the basic installation of CUDA for NVIDIA GPUs are given in sec-
tion 7.4
This starts up an Ncurses-based configuration. An NVIDIA GPU can be configured via for Slurm
using the Setup (Step by Step) option for Slurm (section 7.3.2 of the Administrator Manual).
After configuring the WLM server, WLM submission and WLM client roles for the nodes of the
cluster, a screen that asks if GPU resources should be configured is displayed (figure 1.1):
Following through brings up a GPU device settings configuration screen (figure 1.2):
Figure 1.2: Slurm With cm-wlm-setup: GPU Device Settings Configuration Screen
The help text option in the screen gives hints based on the descriptions at https://slurm.schedmd.
com/gres.conf.html, and also as seen in Slurm’s man (5) gres.conf.
8 Quickstart Installation Guide
Figure 1.2 shows 2 physical GPUs on the node being configured. The type is an arbitrary string for
the GPU, and each CPU core is allocated an alias GPU device.
The next screen (figure 1.3) allows the NVIDIA CUDA MPS (Multi-Process Service) to be config-
ured:
Figure 1.3: Configuring An NVIDIA GPU For Slurm With cm-wlm-setup: MPS Settings Configuration
Screen
The help text for this screen gives hints on how the fields can be filled in. The number of GPU
cores (figure 1.3) for a GPU device can be set.
The rest of the cm-wlm-setup procedure can then be completed.
The regular nodes that had a role change during cm-wlm-setup can then be rebooted to pick up the
workload manager (WLM) services. A check via the cmsh command ds should show what nodes
need a restart.
Example
If, for example, the range from node001 to node015 needs to be restarted to get the WLM services
running, then it could be carried out with:
Example
More on these attributes can be found in the man pages (man 5 gres.conf).
NVIDIA configuration for Slurm and other workload managers is described in further detail in
section 7.5 of the Administrator Manual
The “Hello World” helloworld.cu script from section 8.5.4 of the User Manual can be saved in
auser’s directory, and then compiled for a GPU with nvcc:
The output from submission to a node with a GPU can then be seen:
More about Slurm batch scripts and GPU compilation can be found in Chapter 8 of the User Man-
ual.
Users can use the modules command to switch the environment to the appropriate Python version.
For example, to switch to Python 3.9:
[root@bright92 ~]# python -V
Python 3.6.8
[root@bright92 ~]# module load python39
[root@bright92 ~]# python -V
Python 3.9.10
If the change is carried out correctly, then support is not available for Python-related bugs, but is
available for the cluster manager-related features.
• This manual, the Installation Manual, has more details and background on the installation of
the cluster in the next chapters.
• The Upgrade Manual describes upgrading from earlier versions of NVIDIA Bright Cluster Man-
ager.
• The User Manual describes the user environment and how to submit jobs for the end user
• The Cloudbursting Manual describes how to deploy the cloud capabilities of the cluster.
• The Developer Manual has useful information for developers who would like to program with
the cluster manager.
• The Machine Learning Manual describes how to install and configure machine learning capa-
bilities with the cluster manager.
• The Containerization Manual describes how to manage containers with the cluster manager.
2
Introduction
This chapter introduces some features of NVIDIA Bright Cluster Manager and describes a basic cluster
in terms of its hardware.
• Bright View (section 2.4 of the Administrator Manual): a GUI which conveniently runs on modern
desktop web browsers, and therefore on all operating system versions that support a modern
browser. This includes Microsoft Windows, MacOS and iOS, and Linux.
• cmsh (section 2.5 of the Administrator Manual): an interactive shell front end that can be accessed
from any computing device with a secured SSH terminal access
The head node is the most important machine within a cluster because it controls all other devices,
such as compute nodes, switches and power distribution units. Furthermore, the head node is also the
host that all users (including the administrator) log in to in a default cluster. The head node is typically
the only machine that is connected directly to the external network and is usually the only machine in a
cluster that is equipped with a monitor and keyboard. The head node provides several vital services to
the rest of the cluster, such as central data storage, workload management, user management, DNS and
DHCP service. The head node in a cluster is also frequently referred to as the master node.
Often, the head node is replicated to a second head node, frequently called a passive head node. If
the active head node fails, the passive head node can become active and take over. This is known as
a high availability setup, and is a typical configuration (Chapter 17 of the Administrator Manual) in the
cluster manager.
A cluster normally contains a considerable number of non-head, or regular nodes, also referred to
simply as nodes. The head node, not surprisingly, manages these regular nodes over the network.
Most of the regular nodes are compute nodes. Compute nodes are the machines that will do the
heavy work when a cluster is being used for large computations. In addition to compute nodes, larger
clusters may have other types of nodes as well (e.g. storage nodes and login nodes). Nodes typically
install automatically through the (network bootable) node provisioning system that is included with the
cluster manager. Every time a compute node is started, the software installed on its local hard drive
is synchronized automatically against a software image which resides on the head node. This ensures
that a node can always be brought back to a “known state”. The node provisioning system greatly eases
compute node administration and makes it trivial to replace an entire node in the event of hardware
failure. Software changes need to be carried out only once (in the software image), and can easily be
undone. In general, there will rarely be a need to log on to a compute node directly.
In most cases, a cluster has a private internal network, which is usually built from one or multiple
managed Gigabit Ethernet switches, or made up of an InfiniBand or Omni-Path fabric. The internal net-
work connects all nodes to the head node and to each other. Compute nodes use the internal network for
booting, data storage and interprocess communication. In more advanced cluster setups, there may be
several dedicated networks. It should be noted that the external network—which could be a university
campus network, company network or the Internet—is not normally directly connected to the internal
network. Instead, only the head node is connected to the external network.
Figure 2.1 illustrates a typical cluster network setup.
2.2 Cluster Structure 13
Most clusters are equipped with one or more power distribution units. These units supply power to
all compute nodes and are also connected to the internal cluster network. The head node in a cluster can
use the power control units to switch compute nodes on or off. From the head node, it is straightforward
to power on/off a large number of compute nodes with a single command.
3
Installing NVIDIA Bright
Cluster Manager
This chapter describes in detail the installation of NVIDIA Bright Cluster Manager onto the head node
of a cluster. Sections 3.1 and 3.2 list hardware requirements and supported hardware. Section 3.3 gives
step-by-step instructions on installing the cluster manager from a DVD or USB drive onto a head node
that has no operating system running on it initially, while section 3.4 gives instructions on installing
onto a head node that already has an operating system running on it.
Once the head node is installed, the other, regular, nodes can (network) boot off the head node
and provision themselves from it with a default image, without requiring a Linux distribution DVD
or USB drive themselves. Regular nodes normally have any existing data wiped during the process of
provisioning from the head node, which means that a faulty drive can normally simply be replaced by
taking the regular node offline, replacing its drive, and then bringing the node back online, without
special reconfiguration. The details of the network boot and provisioning process for the regular nodes
are described in Chapter 5 of the Administrator Manual.
The installation of software on an already-configured cluster running the cluster manager is de-
scribed in Chapter 11 of the Administrator Manual.
• 80GB diskspace
• 2 Gigabit Ethernet NICs (for the most common Type 1 topology (section 3.3.9))
Recommended hardware requirements for larger clusters are discussed in detail in Appendix B.
• Atos
• Cavium
• Cisco
• Cray
• Dell EMC
• Fujitsu
• Huawei
• IBM
• Lenovo
• NVIDIA DGX
• Oracle
• SGI (ICE X)
• SuperMicro
Other brands are also expected to work, even if not explicitly supported.
• Dell
• Huawei
• Netgear
• Nortel
• SuperMicro
Other brands are also expected to work, although not explicitly supported.
3.3 Head Node Installation: Bare Metal Method 17
Other brands with the same SNMP MIB mappings are also expected to work, although not explicitly
supported.
• iDRAC
• IPMI 1.5/2.0
• CIMC
• Redfish v1
3.2.6 GPUs
• AMD Radeon GPUs, as listed at https://support.amd.com/en-us/kb-articles/Pages/
Radeon-Software-for-Linux-Release-Notes.aspx
• NVIDIA Tesla with latest recommended drivers
• NVIDIA GeForce and other older generations are mostly supported. Bright Computing can be
consulted for details.
• NVIDIA DGX servers and workstations are supported for Ubuntu 20.04 at the time of writing of
this section (March 2023).
3.2.7 MICs
• Xeon Phi: All Xeon Phi processor versions from Knights Landing onward. PCI-e coprocessor
versions of Xeon Phi do not have direct integration with the cluster manager from NVIDIA Bright
Cluster Manager version 8.2 onward.
3.2.8 RAID
Software or hardware RAID are supported. Fake RAID is not regarded as a serious production option
and is supported accordingly.
are dealt with correctly. Details on installing the cluster manager onto virtual instances can be found in
the cluster manager Knowledge Base at http://kb.brightcomputing.com.
To start a physical bare metal installation, the time in the BIOS of the head node is set to local time.
The head node is then made to boot from DVD or USB, which can typically be done by appropriate
keystrokes when the head node boots, or via a BIOS configuration.
Special steps for installation from a bootable USB device: If a bootable USB device is to be used, then
the instructions within the Bright ISO, in the file README.BRIGHTUSB should be followed to copy the ISO
image over to the USB device. After copying the ISO image, the MD5 checksum should be validated
to verify that the copied ISO is not corrupt. This is important, because corruption is possible in subtle
ways that may affect operations later on, and in ways that are difficult to uncover.
The ISO Boot menu offers a default option of booting from the hard drive, with a countdown to
3.3 Head Node Installation: Bare Metal Method 19
starting the hard drive boot. To install the cluster manager, the countdown should be interrupted by
selecting the option of “Install NVIDIA Bright Cluster Manager (Graphical)” instead.
Selecting the option allows kernel parameter options to be provided to the installer.
Default kernel parameter options are provided so that the administrator can simply press the enter
key to go straight on to start the installer, and bring up the welcome screen (section 3.3.2).
1. a setting for the external network interface that is to be used. For example: eth0 or eth1.
2. a setting for the network configuration of the external network, to be explained soon. The network
configuration option can be built either using static IP addressing or with DHCP.
3. a setting for the password, for example secretpass, for the login to the cluster manager that is
about to be installed.
Example
Example
A remote installation can alternatively be carried out later on without setting netconf, by using
the text mode installer to set up networking (section 3.5), or by using GUI mode installer Continue
remotely option (figure 3.4).
An administrator who would like to simply start installation can click on the Start installation
button at the left side of the screen.
A similar screen after that asks the user to agree to the Base Distribution EULA. This is the end
user license agreement for the distribution that is to be used as the base upon which the cluster manager
is to run.
• Load config: allows an existing configuration file to be loaded and used by the installation. This
option is available only during the first few screens.
• Show config: allows any already loaded configuration file to be displayed. There is a default con-
figuration loaded by default, with values that may suit the cluster already. However, the defaults
are not expected to be optimal, and may not even work for the actual physical configuration.
• Continue remotely: allows the administrator to leave the console and access the cluster from a
remote location. This can be useful for administrators who prefer to avoid working inside a noisy
cold data center. If Continue remotely is selected, then addresses are displayed on the console
screen, for use with a web browser or SSH, and the console installation screen is locked.
• Back: if not grayed out, allows the administrator to go back a step in the installation.
Changes to the modules to be loaded can be entered by reordering the loading order of modules, by
removing modules, and adding new modules. Clicking the + button opens an input box for adding a
module name and optional module parameters (figure 3.6). The module can be selected from a built-in;
it can be automatically extracted from a .deb or .rpm package; or it can simply be selected by selecting
an available .ko kernel module file from the filesystem.
A module can also be blacklisted, which means it is prevented from being used, by clicking on the
button. This can be useful when replacing one module with another.
3.3 Head Node Installation: Bare Metal Method 23
Clicking Next then leads to the “Hardware info” overview screen, described next.
Figure 3.7: Hardware Overview Based On Hardware Detection Used For Loading Kernel Modules
24 Installing NVIDIA Bright Cluster Manager
Clicking Next in the Hardware Info screen leads to the Installation source configuration screen,
described next.
The administrator must select the correct device to continue the installation.
Optionally, a media integrity check can be set.
Clicking on the Next button starts the media integrity check, if it was set. The media integrity check
can take about a minute to run. If all is well, then the “Cluster settings” setup screen is displayed, as
described next.
• Cluster name
• Administrator email: Where mail to the administrator goes. This need not be local.
• Time zone
• Time servers: The defaults are pool.ntp.org servers. A time server is recommended to avoid
problems due to time discrepancies between nodes.
• Environment modules: Traditional Tcl modules are set by default. Lmod is an alternative.
If no workload manager is selected here, then it can be installed later on, after the cluster installation
without the workload manager has been done. Details on installing a workload manager later on are
given in Chapter 7 on workload management of the Administrator Manual.
The default client slot number that is set depends on the workload manager chosen.
• If PBS Professional or OpenPBS is selected as a workload management system, then the number
of client slots defaults to 1. After the installation is completed the administrator should update the
value in the pbsproclient role to the desired number of slots for the compute nodes.
• For all other workload management systems, the number of client slots is determined automati-
cally.
The head node can also be selected for use as a compute node, which can be a sensible choice on
small clusters.
Clicking Next on this screen leads to the Network topology screen.
A type 1 network: has its nodes connected on a private internal network. This is the default net-
work setup. In this topology, a network packet from a head or regular node destined for any
external network that the cluster is attached to, by default called Externalnet, can only reach the
external network by being routed and forwarded at the head node itself. The packet routing for
Externalnet is configured at the head node.
28 Installing NVIDIA Bright Cluster Manager
A type 2 network: has its nodes connected via a router to a public network. In this topology, a
network packet from a regular node destined for outside the cluster does not go via the head node,
but uses the router to reach a public network. Packets destined for the head node however still
go directly to the head node. Any routing for beyond the router is configured on the router, and
not on the cluster or its parts. Care should be taken to avoid DHCP conflicts between the DHCP
server on the head node and any existing DHCP server on the internal network if the cluster is
being placed within an existing corporate network that is also part of Internalnet (there is no
Externalnet in this topology). Typically, in the case where the cluster becomes part of an existing
network, there is another router configured and placed between the regular corporate machines
and the cluster nodes to shield them from effects on each other.
3.3 Head Node Installation: Bare Metal Method 29
A type 3 network: has its nodes connected on a routed public network. In this topology, a network
packet from a regular node, destined for another network, uses a router to get to it. The head node,
being on another network, can only be reached via a router too. The network the regular nodes are
on is called Internalnet by default, and the network the head node is on is called Managementnet
by default. Any routing configuration for beyond the routers that are attached to the Internalnet
and Managementnet networks is configured on the routers, and not on the clusters or its parts.
A consequence of using a router in the type 3 configuration is that the communication between
the head node and the regular nodes is via OSI layer 3. OSI layer 2 used by DHCP is not directly
supported. However, DHCP packets still need to be exchanged between the head and regular
nodes. The usual way to relay the packets is using a DHCP relay agent. Configuration of a DHCP
relay agent is outside the scope of Bright configuration, and is typically done by the network
administrator or the router vendor.
Selecting the network topology helps decide the predefined networks on the Networks settings
screen later (figure 3.17). Clicking Next here leads to the Head node settings screen, described next.
• the hostname
• the password
Clicking Next leads to the Compute node settings screen, described next.
By default therefore, the first compute node takes the name node001, the second compute node
takes the name node002, and so on.
If the administrator confirms that the nodes are to use BMCs (Baseboard Management Controllers)
that are compatible with IPMI, iLO, CIMC, iDRAC, or Redfish, then the BMC network options appear.
By default, for the compute nodes, the BMC is automatically configured.
For a Type 1 network, the head node BMC is often connected to an ethernet segment that has the
external network running on it, while the BMCs on the compute nodes are normally connected to an
ethernet segment that has the internal network on it.
Once a network associated with the ethernet segment is chosen, it means that further BMC-related
networking values can be set for the BMCs.
A new Layer 3 IP subnet can be created for BMC interfaces.
The BMC interface can be configured as a shared physical interface with an already existing network
interface. However this can in some cases cause problems during early system BIOS checks. A dedicated
physical BMC interface is therefore recommended.
If a BMC is configured, then the BMC password is set to a random value. Retrieving and changing a
BMC password is covered in section 3.7.2 of the Administrator Manual. BMC configuration is discussed
further in section 3.7 of the Administrator Manual.
Clicking Next leads to the Networks screen, described next.
3.3.13 Networks
The Networks configuration screen (figure 3.17) displays the predefined list of networks, based on the
selection of network topology and BMC networks made in the earlier screens.
3.3 Head Node Installation: Bare Metal Method 33
The Networks configuration screen allows the parameters of the network interfaces to be configured
via tabs for each network. In addition to any BMC networks:
For a type 1 setup, an external network and an internal network are always defined.
For a type 2 setup, an internal network is defined but no external network is defined.
For a type 3 setup, an internal network and a management network are defined.
Thus, for a type 1 network, for example, the networking details:
• for externalnet correspond to the details of the head node external network interface.
• for internalnet correspond to the details of how the compute nodes are configured.
• for a BMC network correspond to the details of how the BMC is connected
Additional custom networks can be added in the Networks configuration screen by clicking on the
+ button.
Clicking Next in this screen validates all network settings. Invalid settings for any of the defined
networks cause an alert to be displayed, explaining the error. A correction is then needed to proceed
further. Settings may of course be valid, but incorrect—the validation is merely a sanity check. It may
be wise for the cluster administrator to check with the network specialist that the networks that have
been configured are set up as really intended.
If all settings are valid, then the Next button brings the installation on to the Head node interfaces
screen, described next.
34 Installing NVIDIA Bright Cluster Manager
If a BMC network is to be shared with a regular network, then an alias interface is shown too. In
figure 3.18 an alias interface, ens3:ipmi, is shown.
Interfaces can be created or removed.
Dropdown selection allows the proposed values to be changed. It is possible to swap network inter-
faces with dropdown selection.
Clicking the Next button brings the installation on to the Compute node interfaces screen, de-
scribed next.
The boot interface BOOTIF is used to pick up the image for the node via node provisioning.
The IP offset is used to calculate the IP address assigned to a regular node interface. The nodes are
conveniently numbered in a sequence, so their interfaces are typically also given a network IP address
that is in a sequence on a selected network. In the cluster manager, interfaces by default have their IP
addresses assigned to them sequentially, in steps of 1, starting after the network base address.
The default IP offset is 0.0.0.0, which means that the node interfaces by default start their range at
the usual default values in their network.
With a modified IP offset, the point at which addressing starts is altered. For example, a different
offset might be desirable when no IPMI network has been defined, but the nodes of the cluster do have
IPMI interfaces in addition to the regular network interfaces. If a modified IP offset is not set for one of
the interfaces, then the BOOTIF and ipmi0 interfaces get IP addresses assigned on the same network by
default, which could be confusing.
However, if an offset is entered for the ipmi0 interface, then the assigned IPMI IP addresses start
from the IP address specified by the offset. That is, each modified IPMI address takes the value:
address that would be assigned by de f ault + IP o f f set
Example
Taking the case where BOOTIF and IPMI interfaces would have IP addresses on the same network with
the default IP offset:
Then, on a cluster of 10 nodes, a modified IPMI IP offset of 0.0.0.20 means:
Clicking the Next button brings the installation on to the Disk layout screen, described next.
• the administrator must select the drive on the head node where the cluster manager is to be in-
stalled.
• the administrator must set the disk partitioning layout for the head node and regular nodes with
the two options: Head node disk layout and Compute nodes disk layout.
– A custom partitioning layout specification file can be added with the icon.
3.3 Head Node Installation: Bare Metal Method 37
– The partitioning layout can be edited with the icon. This brings up a screen (figure 3.21)
that allows the administrator to view and change layout values within the layout’s configu-
ration XML file:
and ticking the Enable encryption checkbox, makes the LUKS configuration parame-
ters available (figure 3.22):
The parameters can be left at their default values to set up an encrypted partition.
If setting parameters, then there are some existing fields to set the more common param-
eters. Settings for less-common parameters that have no existing fields can be specified
and appended to the field with the Additional Parameters: setting.
The settings are automatically stored in the XML specification for the disk layout and can
be viewed there by selecting the XML Output tab.
How a cluster administrator applies this configured disk encryption to a node that is
booting up is covered in Appendix D.17 of the Administrator Manual.
Clicking Next on the Disk layout screen leads to the Additional software screen, described next.
Clicking Next on the Additional software screen leads to the Summary screen, described next.
3.3.18 Summary
The Summary screen (figure 3.24), summarizes some of the installation settings and parameters config-
ured during the previous stages.
40 Installing NVIDIA Bright Cluster Manager
3.3.19 Deployment
The Deployment screen (figure 3.25) shows the progress of the deployment. It is not possible to navigate
back to previous screens once the installation has begun. The installation log can be viewed in detail by
clicking on Install log.
The Reboot button restarts the machine. Alternatively, the head node can be set to automatically
reboot when deployment is complete.
During the reboot, the BIOS boot order may need changing, or the DVD may need to be removed, in
order to boot from the hard drive on which the cluster manager has been installed.
3.4 Head Node Installation: Ansible Add-On Method 41
After rebooting, the system starts and presents a login prompt. The cluster administrator can log in
as root using the password that was set during the installation procedure.
The cluster should then be updated with the latest packages (Chapter 11 of the Administrator Manual).
After the latest updates have been installed, the system is ready to be configured.
• The installation configuration may conflict with what has already been installed. The problems
that arise can always be resolved, but an administrator that is not familiar with the cluster manager
should be prepared for troubleshooting.
With the release of the cluster manager version 9.2, using the head node installer Ansible collection
is the method for performing add-on installations.
Aside: Ansible can also be used with the cluster manager once NVIDIA Bright Cluster Manager is
installed. This integration is described in section 16.10 of the Administrator Manual.
• An Ansible module is code, usually in Python, that is executed by Ansible to carry out Ansible
tasks, usually on a remote node. The module returns values.
• An Ansible playbook is a YAML file. The file declares a configuration that is to be executed (“the
playbook is followed”) on selected machines. The execution is usually carried out over SSH, by
placing modules on the remote machine.
• Traditionally, official Ansible content was obtained as a part of milestone releases of Ansible En-
gine, (the Red Hat version of Ansible for the enterprise).
• Since Ansible version 2.10, the official way to distribute content is via Ansible content collections.
Collections are composed of Ansible playbooks, modules, module utilities and plugins. The col-
lection is a formatted set of tools used to achieve automation with Ansible.
• https://github.com/Bright-Computing/bright-installer-ansible/tree/main/playbooks
contains additional documentation and example playbooks.
3.5 Enabling Remote Browser-Based Installation Via The Text Mode Installer 43
3.5 Enabling Remote Browser-Based Installation Via The Text Mode Installer
When carrying out an installation as in section 3.3, the installer is normally run on the machine that is
to be the head node of the cluster. For RHEL7 and derivatives, for Ubuntu, and for SLES, a text mode
installer is presented as an alternative to the GUI installer (figures 3.1 and 3.2).
The text mode installer is a very minimal installer compared with the GUI installer. The GUI instal-
lation is therefore usually preferred.
However, in some cases the GUI installation can fail to start. For example, if X is not working cor-
rectly for some reason on the head node.
A way to still run a GUI installation is then to first run the text mode installer, and use it to run the
Remote Install option from its main menu (figure 3.26):
This then sets up network connectivity, and provides the cluster administrator with a remote URL
(figure 3.27):
A browser that is on a machine with connectivity to the head node can then use the provided remote
URL. This then brings up the GUI installer within the browser.
An alternative to running the text mode installer to obtain the remote URL is to use the netconf
kernel parameter instead. Details on configuring this are given in section 3.3.1.
4
Licensing NVIDIA Bright
Cluster Manager
This chapter explains how an NVIDIA Bright Cluster Manager license is viewed, verified, requested,
and installed.
Typically, for a new cluster that is purchased from a reseller, the cluster may have the cluster manager
already set up on it.
The cluster manager can be run with a temporary, or evaluation license, which allows the adminis-
trator to try it out. This typically has some restrictions on the period of validity for the license, or the
number of nodes in the cluster. The evaluation license also comes with the online ISO download for the
cluster manager, which is available for product key owners via http://customer.brightcomputing.
com/Download
The other type of license is the full license, which is almost always a subscription license. Installing
a full license allows the cluster to function without the restrictions of the evaluation license. The ad-
ministrator therefore usually requests a full license, and installs it. This normally only requires the
administrator to:
The preceding takes care of the licensing needs for most administrators, and the rest of this chapter
can then usually conveniently be skipped.
Administrators who would like a better background understanding on how licensing is installed
and used in the cluster manager can go on to read the rest of this chapter.
CMDaemon can run only with an unexpired evaluation or unexpired full license. CMDaemon is the
engine that runs the cluster manager, and is what is normally recommended for further configuration
of the cluster. Basic CMDaemon-based cluster configuration is covered in Chapter 3 of the Administrator
Manual.
Any cluster manager installation requires a license file to be present on the head node. The license file
details the attributes under which a particular cluster manager installation has been licensed.
Example
• the “Licensee” details, which include the name of the organization, is an attribute of the license
file that specifies the condition that only the specified organization may use the software
• the “Licensed nodes” attribute specifies the maximum number of nodes that the cluster manager
may manage. Head nodes are also regarded as nodes for this attribute.
46 Licensing NVIDIA Bright Cluster Manager
• the “Expiration date” of the license is an attribute that sets when the license expires. It is some-
times set to a date in the near future so that the cluster owner effectively has a trial period. A
new license with a longer period can be requested (section 4.3) after the owner decides to continue
using the cluster with the cluster manager
A license file can only be used on the machine for which it has been generated and cannot be changed
once it has been issued. This means that to change licensing conditions, a new license file must be issued.
The license file is sometimes referred to as the cluster certificate, or head node certificate, because it is
the X509v3 certificate of the head node, and is used throughout cluster operations. Its components are
located under /cm/local/apps/cmd/etc/. Section 2.3 of the Administrator Manual has more information
on certificate-based authentication.
1 Bright View is typically accessed via a “home” URL in the form of https://<head node address>:8081/bright-view/
4.2 Verifying A License—The verify-license Utility 47
Example
The license shown in the preceding example allows 100 nodes to be used. In addition, an unlim-
ited number of pay-per-use nodes can be used when cloudbursting. Pay-per-use nodes are covered in
section 7.5 of the Cloudbursting Manual.
The license is tied to a specific MAC address, so it cannot simply be used elsewhere. For convenience,
the Node Count field in the output of licenseinfo shows the current number of nodes used.
Example
but further information cannot be obtained using Bright View or cmsh, because these clients themselves
obtain their information from the cluster management daemon.
In such a case, the verify-license utility allows the troubleshooting of license issues.
Example
Example
3. Using verify-license with the verify option: checks the validity of the license:
• If the license is valid, then no output is produced and the utility exits with exit-code 0.
• If the license is invalid, then output is produced indicating what is wrong. Messages such as these
are then displayed:
– if the license is due to expire in more than that number of months, then the verify-license
command returns nothing.
– if the license is due to expire in less than that number of months, then the verify-license
command returns the date of expiry
• If a number value is not set for monthsleft, then the value is set to 12 by default. In other words,
the default value means that if the license is due to expire in less than 12 months, then the date of
expiry of the license is displayed.
Example
[root@bright92 etc]# date
Wed Sep 19 14:55:16 CET 2018
[root@bright92 etc]# verify-license monthsleft
Bright Cluster Manager License expiration date: 31 Dec 2018
[root@bright92 etc]# verify-license monthsleft=3
[root@bright92 etc]# verify-license monthsleft=4
Bright Cluster Manager License expiration date: 31 Dec 2018
4.2.3 Using The versioninfo Command To Verify The Cluster Manager Version
The license version should not be confused with the cluster version. The license version is a license
format version that rarely changes between cluster manager version releases. Thus a cluster can have a
license with version 7.0, which was the license format introduced during NVIDIA Bright Cluster Man-
ager 7.0, and have a cluster manager version 8.1.
The version of a cluster can be viewed with using the versioninfo command, which can be run from
the main mode of cmsh as follows:
Example
root@bright92 ~]# cmsh
[bright92]% main
[bright92->main]% versioninfo
Version Information
------------------------ ----------------------------------------------------
Cluster Manager 9.2
CMDaemon 2.2
CMDaemon Build Index 151494
CMDaemon Build Hash fc86e6036f
Database Version 36249
• Evaluation product key: An evaluation license is a temporary license that can be installed via an
evaluation product key. The evaluation product key is valid for a maximum of 3 months from a
specified date, unless the account manager approves a further extension.
If a cluster has the cluster manager installed on it, then a temporary license to run the cluster
can be installed with an evaluation product key. Such a key allows to the cluster to run with
defined attributes, such as a certain number of nodes and features enabled, depending on what
was agreed upon with the account manager. The temporary license is valid until the product key
expires, unless the account manager has approved further extension of the product key, and the
license has been re-installed.
DVD downloads of the cluster manager from the Bright Computing website come with a built-in
license that overrides any product key attributes. The license is valid for a maximum of 3 months
from the download date. An evaluation product key allows the user to download such a DVD,
and the built-in license then allows 2-node clusters to be tried out. Such a cluster can comprise 1
head node and 1 compute node, or comprise 2 head nodes.
• Subscription product key: A subscription license is a license can be installed with a subscrip-
tion product key. The subscription product key has some attributes that decide the subscription
length and other settings for the license. At the time of writing (September 2017), the subscription
duration is a maximum of 5 years from a specified date.
If a cluster has the cluster manager installed on it, then a subscription license to run the cluster
can be installed with a subscription product key. Such a key allows the cluster to run with defined
attributes, such as a certain number of nodes and features enabled, depending on what was agreed
upon with the account manager. The subscription license is valid until the subscription product
key expires.
• Hardware lifetime product key: This is a legacy product key that is supported for the hardware
lifetime. It is no longer issued.
4.3 Requesting And Installing A License Using A Product Key 51
If the product key has been used on the cluster already: then it can be retrieved from the CSR file
(page 52) with the command:
cm-get-product-key
• to register the key using the Bright Computing customer portal (section 4.3.9) account.
The following terminology is used: when talking about product keys, locking, licenses, installation,
and registration:
• activating a license: A product key is obtained from any cluster manager (re)seller. It is used to
obtain and activate a license file. Activation means that Bright Computing records that the product
key has been used to obtain a license file. The license obtained by product key activation permits
the cluster to work with particular settings. For example, the subscription period, and the number
of nodes. The subscription start and end date cannot be altered for the license file associated with
the key, so an administrator normally activates the license file as soon as possible after the starting
date in order to not waste the subscription period.
• locking a product key: The administrator is normally allowed to use a product key to activate a
license only once. This is because a product key is locked on activation of the license. A locked state
means that product key cannot activate a new license—it is “used up”.
An activated license only works on the hardware that the product key was used with. This could
obviously be a problem if the administrator wants to move the cluster manager to new hardware.
In such a case, the product key must be unlocked. Unlocking is possible for a subscription license
via the customer portal (section 4.3.9). Unlocking an evaluation license, or a hardware lifetime
license, is possible by sending a request to the account manager at Bright Computing to unlock
the product key. Once the product key is unlocked, then it can be used once again to activate a
new license.
• license installation: License installation occurs on the cluster after the license is activated and is-
sued. The installation is done automatically if possible. Sometimes installation needs to be done
manually, as explained in the section on the request-license script (page 51). The license can
only work on the hardware it was specified for. After installation is complete, the cluster runs
with the activated license.
• product key registration: Product key registration occurs on the customer portal (section 4.3.9) ac-
count when the product key is associated with the account.
There are three options to use the product key to get the license:
1. Direct WWW access: If the cluster has access to the WWW port, then a successful completion of
the request-license command obtains and activates the license. It also locks the product key.
• Proxy WWW access: If the cluster uses a web-proxy, then the environment variable
http_proxy must be set before the request-license command is run. From a bash prompt
this is set with:
export http_proxy=<proxy>
where <proxy> is the hostname or IP address of the proxy. An equivalent alternative is that
the ScriptEnvironment directive (page 829 of the Administrator Manual), which is a CMDae-
mon directive, can be set and activated (page 811 of the Administrator Manual).
2. Off-cluster WWW access: If the cluster does not have access to the WWW port,
but the administrator does have off-cluster web-browser access, then the point at
which the request-license command prompts “Submit certificate request to
http://licensing.brightcomputing.com/licensing/index.cgi ?” should be answered
negatively. CSR (Certificate Sign Request) data generated is then conveniently displayed
on the screen as well as saved in the file /cm/local/apps/cmd/etc/cluster.csr.new. The
cluster.csr.new file may be taken off-cluster and processed with an off-cluster browser.
The CSR file should not be confused with the private key file, cluster.key.new, created shortly
beforehand by the request-license command. In order to maintain cluster security, the private
key file must, in principle, never leave the cluster.
At the off-cluster web-browser, the administrator may enter the cluster.csr.new content in a
web form at:
http://licensing.brightcomputing.com/licensing
A signed license text is returned. At Bright Computing the license is noted as having been acti-
vated, and the product key is locked.
The signed license text received by the administrator is in the form of a plain text certificate. As
the web form response explains, it can be saved directly from most browsers. Cutting and pasting
the text into an editor and then saving it is possible too, since the response is plain text. The saved
signed license file, <signedlicense>, should then be put on the head node. If there is a copy of
the file on the off-cluster machine, the administrator should consider wiping that copy in order to
reduce information leakage.
The command:
install-license <signedlicense>
installs the signed license on the head node, and is described further on page 53. Installation
means the cluster now runs with the activated certificate.
3. Fax or physical delivery: If no internet access is available at all to the administrator, the CSR data
may be faxed or sent as a physical delivery (postal mail print out, USB flash drive/floppy disk) to
any cluster manager reseller. A certificate will be faxed or sent back in response, the license will
be noted by Bright Computing as having been activated, and the associated product key will be
noted as being locked. The certificate can then be handled further as described in option 2.
Example
Contacting http://licensing.brightcomputing.com/licensing/...
License granted.
License data was saved to /cm/local/apps/cmd/etc/cluster.pem.new
Install license ? [Y/n] n
Use "install-license /cm/local/apps/cmd/etc/cluster.pem.new" to install the license.
Example
==========================================
• If the old head node is not able to run normally, then the new head node can have the head node
data placed on it from the old head node data backup.
• If the old head node is still running normally, then the new head node can have data placed on it
by a cloning action run from the old head node (section 17.4.8 of the Administrator Manual).
• a user with a subscription license can unlock the product key directly via the customer portal
(section 4.3.9).
• a user with a hardware license almost always has the license under the condition that it expires
when the hardware expires. Therefore, a user with a hardware license who is replacing the hard-
ware is almost always restricted from a license reinstallation. Users without this restriction may
request the account manager at Bright Computing to unlock the product key.
Using the product key with the request-license script then allows a new license to be requested,
which can then be installed by running the install-license script. The install-license script may
not actually be needed, but it does no harm to run it just in case afterwards.
• The full drive image can be copied on to a blank drive and the system will work as before.
– then after the installation is done, a license can be requested and installed once more using
the same product key, using the request-license command. Because the product key is
normally locked when the previous license request was done, a request to unlock the product
key usually needs to be sent to the account manager at Bright Computing before the license
request can be executed.
– If the administrator wants to avoid using the request-license command and having to type
in a product key, then some certificate key pairs must be placed on the new drive from the
old drive, in the same locations. The procedure that can be followed is:
1. in the directory /cm/local/apps/cmd/etc/, the following key pair is copied over:
* cluster.key
* cluster.pem
Copying these across means that request-license does not need to be used.
2. The admin.{pem|key} key pair files can then be placed in the directory /root/.cm/cmsh/.
Two options are:
* the following key pair can be copied over:
· admin.key
4.3 Requesting And Installing A License Using A Product Key 55
· admin.pem
or
* a fresh admin.{pem|key} key pair can be generated instead via a cmd -b option:
Example
[root@bright92 ~]# service cmd stop
[root@bright92 ~]# cmd -b
[root@bright92 ~]# [...]
Tue Jan 21 11:47:54 [ CMD ] Info: Created certificate in admin.pem
Tue Jan 21 11:47:54 [ CMD ] Info: Created certificate in admin.key
[root@bright92 ~]# [...]
[root@bright92 ~]# chmod 600 admin.*
[root@bright92 ~]# mv admin.* /root/.cm/cmsh/
[root@bright92 ~]# service cmd start
It is recommended for security reasons that the administrator ensure that unnecessary extra
certificate key pair copies no longer exist after installation on the new drive.
The subsequent times that the same product key, or another product key, is used: If a license has
become invalid, a new license may be requested. On running the command request-license for the
cluster, with the same product key, or another product key, the administrator is prompted on whether
to re-use the existing keys and settings from the existing license:
Example
• If the existing keys are kept, a pdsh -g computenode reboot is not required. This is because these
keys are X509v3 certificates issued from the head node. For these:
– Any node certificates (section 5.4.1 of the Administrator Manual) that were generated using the
old certificate are therefore still valid and so regenerating them for nodes via a reboot is not
required, allowing users to continue working uninterrupted. On reboot new node certificates
are generated and used if needed.
– User certificates (section 6.4 of the Administrator Manual) become invalid during certificate
regeneration when CMDaemon restarts itself. It is therefore advised to install a permanent
license as soon as possible, or alternatively, to not bother creating user certificates until a
permanent license has been set up for the cluster.
• If the existing keys are not re-used, then node communication ceases until the nodes are rebooted.
If there are jobs running on the cluster manager nodes, they cannot then complete.
After the license is installed, verifying the license attribute values is a good idea. This can be done
using the licenseinfo command in cmsh, or by selecting the License info menu option from within
the Partition base window in Bright View’s Cluster resource (section 4.1).
The --auto-attach option allows a system to update its subscription automatically, so that the sys-
tem ends up with a valid subscription state.
If the head node has no direct connection to the internet, then an HTTP proxy can be configured as
a command line option. The subscription-manager man pages give details on configuring the proxy
from the command line.
A valid subscription means that, if all is well, then the RHEL server RPMs repository (rhel-6-server-
rpms or rhel-7-server-rpms) is enabled, and means that RPMs can be picked up from that repository.
58 Linux Distributions That Use Registration
For some RHEL7 packages, the RHEL7 extras repository has to be enabled in a similar manner. The
option used is then --enable rhel-7-server-extras-rpms.
A list of the available repositories for a subscription can be retrieved using:
After registration, the yum subscription-manager plugin is enabled. This means that yum can now
be used to install and update from the Red Hat Network repositories.
After the software image is registered, the optional and extras RPMs repository must be enabled
using, for RHEL7 systems:
After registration, the yum subscription-manager plugin is enabled within the software image.
This means that yum can now be used to install and update the software image from the Red Hat Net-
work repositories
The e-mail address used is the address that was used to register the subscription with Novell. When
logged in on the Novell site, the activation code or registration code can be found at the products
overview page after selecting “SUSE Linux Enterprise Server”.
After registering, the SLES and SLE SDK repositories are added to the repository list and enabled.
The defined repositories can be listed with:
[root@bright92 ~]# zypper lr
The e-mail address is the address used to register the subscription with Novell. When logged in on
the Novell site, the activation code or registration code can be found at the products overview page after
selecting “SUSE Linux Enterprise Server”.
When running the registration command, warnings about the /sys or /proc filesystems can be ig-
nored. The command tries to query hardware information via these filesystems, but these are empty
filesystems in a software image, and only fill up on the node itself after the image is provisioned to the
node.
Instead of registering the software image, the SLES repositories can be enabled for the
default-image software image with:
The copied files should be reviewed. Any unwanted repositories, unwanted service files, and un-
wanted credential files, must be removed.
The repository list of the default-image software image can be viewed with the chroot option, -R,
as follows:
6.2 Method
A cluster consists of a head node, say bright92 and one or more regular nodes. The head node of the
cluster is assumed to face the internal network (the network of regular nodes) on one interface, say eth0.
The external network leading to the internet is then on another interface, say eth1. This is referred to as
a type 1 configuration in this manual (section 3.3.9).
Typically, an administrator gives the head node a static external IP address before actually connect-
ing it up to the external network. This requires logging into the physical head node with the vendor-
supplied root password. The original network parameters of the head node can then be viewed and set.
For example for eth1:
# cmsh -c "device interfaces bright92; get eth1 dhcp"
yes
Other external network parameters can be viewed and set in a similar way, as shown in table 6.1. A
reboot implements the networking changes.
6.3 Terminology
A reminder about the less well-known terminology in the table:
• netmaskbits is the netmask size, or prefix-length, in bits. In IPv4’s 32-bit addressing, this can be up
to 31 bits, so it is a number between 1 and 31. For example: networks with 256 (28 ) addresses (i.e.
Changing The Network Parameters Of The Head Node
Table 6.1: External Network Parameters And How To Change Them On The Head Node
Network Parameter Description Operation Command Used
IP address of head node view cmsh -c "device interfaces bright92; get eth1 ip"
IP∗
on eth1 interface set cmsh -c "device interfaces bright92; set eth1 ip address; commit"
base IP address (network view cmsh -c "network get externalnet baseaddress"
baseaddress∗
address) of network set cmsh -c "network; set externalnet baseaddress address; commit"
broadcast IP address of view cmsh -c "network get externalnet broadcastaddress"
broadcastaddress∗
network set cmsh -c "network; set externalnet broadcastaddress address; commit"
netmask in CIDR notation view cmsh -c "network get externalnet netmaskbits"
netmaskbits
(number after “/”, or prefix set cmsh -c "network; set externalnet netmaskbits bitsize; commit"
length)
gateway (default route) view cmsh -c "network get externalnet gateway"
gateway∗
IP address set cmsh -c "network; set externalnet gateway address; commit"
view cmsh -c "partition get base nameservers"
nameservers∗, ∗∗ nameserver IP addresses
set cmsh -c "partition; set base nameservers address; commit"
view cmsh -c "partition get base searchdomains"
searchdomains∗∗ name of search domains
set cmsh -c "partition; set base searchdomains hostname; commit"
view cmsh -c "partition get base timeservers"
timeservers∗∗ name of timeservers
set cmsh -c "partition; set base timeservers address; commit"
* If address is set to 0.0.0.0 then the value offered by the DHCP server on the external network is accepted
62
** Space-separated multiple values are also accepted for these parameters when setting the value for address or hostname.
6.3 Terminology 63
with host addresses specified with the last 8 bits) have a netmask size of 24 bits. They are written
in CIDR notation with a trailing “/24”, and are commonly spoken of as “slash 24” networks.
• baseaddress is the IP address of the network the head node is on, rather than the IP address of
the head node itself. The baseaddress is specified by taking netmaskbits number of bits from the
IP address of the head node. Examples:
– A network with 256 (28 ) host addresses: This implies the first 24 bits of the head node’s
IP address are the network address, and the remaining 8 bits are zeroed. This is specified
by using “0” as the last value in the dotted-quad notation (i.e. zeroing the last 8 bits). For
example: 192.168.3.0
– A network with 128 (27 ) host addresses: Here netmaskbits is 25 bits in size, and only the
last 7 bits are zeroed. In dotted-quad notation this implies “128” as the last quad value (i.e.
zeroing the last 7 bits). For example: 192.168.3.128.
When in doubt, or if the preceding terminology is not understood, then the values to use can be calcu-
lated using the head node’s sipcalc utility. To use it, the IP address in CIDR format for the head node
must be known.
When run using a CIDR address value of 192.168.3.130/25, the output is (some output removed for
clarity):
# sipcalc 192.168.3.130/25
# sipcalc -b 192.168.3.130/25
7.2 Shorewall
Package name: shorewall
In NVIDIA Bright Cluster Manager 9.2, Shorewall is managed by CMDaemon, in order to handle
the automation of cloud node access. Restarting Shorewall can thus also be carried out within the
services submode (section 3.11 of the Administrator Manual), on the head node. For example a head
node bright92 the cmsh session to carry out a restart of shorewall might be:
[bright92->device[bright92]->services[shorewall]]% restart
restart Successfully restarted service shorewall on: bright92
System administrators who need a deeper understanding of how Shorewall is implemented should
be aware that Shorewall does not really run as a daemon process. The command to restart the service
therefore does not stop and start a shorewall daemon. Instead it carries out the configuration of netfilter
through implementing the iptables configuration settings, and then exits. It exits without leaving a
shorewall process up and running, even though service shorewall status shows it is running.
• SSH
• HTTP
• HTTPS
Example
A restart of CMDaemon has the change take effect, and takes care of opening the firewall on port
8082 for CMDaemon, by adding a line to the rules file of Shorewall. The original port 8081 remains
open, but CMDaemon no longer listens to it.
The status of ports used by the cluster manager can be listed with:
7.2 Shorewall 67
7.2.3 Clear And Stop Behavior In service Options, bash Shell Command, And cmsh Shell
To remove all rules, for example for testing purposes, the clear option should be used from the Unix
shell. This then allows all network traffic through:
shorewall clear
Administrators should be aware that in the Linux distributions supported by the cluster manager,
the service shorewall stop command corresponds to the unix shell shorewall stop command, and
not to the unix shell shorewall clear command. The stop option for the service and shell blocks net-
work traffic but allows a pre-defined minimal safe set of connections, and is not the same as completely
removing Shorewall from consideration. The stop options discussed so far should not be confused with
the equivalent stop option in the cmsh shell.
This situation is indicated in the following table:
Correspondence Of Stop And Clear Options In Shorewall Vs cmsh
iptables rules Service Unix Shell cmsh shell
keep a safe set: service shorewall stop shorewall stop no equivalent
clear all rules: no equivalent shorewall clear stop shorewall
For example, to add a policy to the the cluster manager-managed section of the
/etc/shorewall/policy file on the head node, a cmsh session can be run as follows:
Example
The preceding output shows no additional policies are currently managed by the cluster manager in
that mode. To add some the cluster manager-managed policies, the administrator can check how to do
it by typing in the add command (section 2.5.3 of the Administrator Manual) without any arguments:
[head->device[head]->roles[firewall]->policies]% add
Name:
Create a new firewallpolicy with specified policy
Usage:
add <policy>
add <source>
add <source> <dest>
add <source> <dest> <policy>
add <source> <dest> <policy> <log>
7.2 Shorewall 69
Examples:
add loc net ACCEPT
add loc net ACCEPT info
Based on the lookup and some familiarity with Shorewall’s policy file, the administrator can now com-
pose suitable arguments for the add command, and commit (section 2.5.3 of the Administrator Manual)
the changes:
Example
[head->device*[head*]->roles*[firewall*]->policies[0]]% commit
[head->device[head]->roles[firewall]->policies]% !head -4 /etc/shorewall/policy
# This section of this file was automatically generated by cmd. Do not edit manually!
# BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE
net fw ACCEPT info
# END AUTOGENERATED SECTION -- DO NOT REMOVE
After the commit command is run, the additional policy is placed and becomes active in the policy
file.
A failover procedure is quite a sensible option when Shorewall is stopped from outside of cmsh or
Bright View, because besides the failover monitoring tests failing, other failures also make the head
node pretty useless. The blocking of ports means that, amongst others, workload managers and NFS
shares are also unable to connect. Ideally, therefore, Shorewall should not be stopped outside cmsh or
Bright View in the first place.
7.3 Compilers
Bright Computing provides convenient RPM and .deb packages for several compilers that are popular
in the HPC community. All of those may be installed through yum, zypper, or apt (section 11.2 of the
Administrator Manual) but (with the exception of GCC) require an installed license file to be used.
7.3.1 GCC
Package name: gcc-recent for RHEL and derivatives, and SLES. cm-gcc for Ubuntu
The GCC suite that the distribution provides is also present by default.
Packages In The Intel Compiler Suite Versions For RHEL And Derivatives, SLES, And Ubuntu
2018 2019 2020
intel-compiler-common-2018 intel-compiler-common-2019 intel-compiler-common-2020
intel-cc-2018 intel-cc-2019 intel-cc-2020
intel-daal-2018 intel-daal-2019 intel-daal-2020
intel-daal-2018-32 intel-daal-2019-32 intel-daal-2020-32
intel-fc-2018 intel-fc-2019 intel-fc-2020
intel-gdb-2018 intel-gdb-2019 intel-gdb-2020
intel-icx-2020
intel-ipp-2018 intel-ipp-2019 intel-ipp-2020
intel-ipp-2018-32 intel-ipp-2019-32 intel-ipp-2020-32
intel-ipp-2018-devel intel-ipp-2019-devel intel-ipp-2020-devel
intel-ipp-2018-devel-32 intel-ipp-2019-devel-32 intel-ipp-2020-devel-32
intel-itac-2018 intel-itac-2019 intel-itac-2020
intel-mkl-2018 intel-mkl-2019 intel-mkl-2020
intel-mkl-2018-32 intel-mkl-2019-32 intel-mkl-2020-32
intel-mpi-2018 intel-mpi-2019 intel-mpi-2020
intel-openmp-2018 intel-openmp-2019 intel-openmp-2020
intel-openmp-2018-32 intel-openmp-2019-32 intel-openmp-2020-32
intel-tbb-2018 intel-tbb-2019 intel-tbb-2020
NVIDIA Bright Cluster Manager 9.2 provides x86_64 packages for the 2018, 2019, and 2020 versions
of the Intel compiler suites. These are for RHEL and derivatives, for SLES, and for Ubuntu, except for
the following packages and distributions:
• The 2018 version of the Intel compiler suite is not supported for RHEL8 and derivatives, and is
also not supported for Ubuntu 20.04. Therefore, for the 2018 suite, packages for these distributions
are not available.
• The 2019 version of the Intel compiler suite is not supported for Ubuntu 20.04. Therefore, for the
2019 suite, a package for this distribution is not available.
Typically the compiler suite includes the Intel Fortran (indicated by fc) and Intel C++ compilers
(part of the C compiler package, indicated by cc). 32-bit compilers are included in the intel-cc-<year>
and intel-fc-<year> packages.
For the other packages, a 32-bit version is sometimes available separately. The 32-bit packages have
package names ending in “-32”
Both the 32-bit and 64-bit versions can be invoked through the same set of commands. The modules
environment (section 2.2 of the Administrator Manual) provided when installing the packages can be
loaded accordingly, to select one of the two versions. For the C++ and Fortran compilers the 64-bit
and 32-bit modules are called as modules beginning with intel/compiler/64 and intel/compiler/32
respectively.
The Intel compiler can be accessed by loading the compiler modules under intel/compiler/64 or
intel/compiler/32. The following commands can be used to run the Intel compilers:
A short summary of a package can be shown using, for example: “yum info intel-fc-<year>”.
The compiler packages require a license, obtainable from Intel, and placed in /cm/shared/licenses/
intel.
Full documentation for the Intel compilers is available at http://software.intel.com/en-us/
intel-compilers/.
In the following example the license file is copied into the appropriate location, the C/C++ compiler
is installed, and a modules environment (section 2.2 of the Administrator Manual) is loaded for use in
this session by the root user. Furthermore, the modules environment is added for regular root user use
with “module initadd”:
72 Third Party Software
Example
How to load modules for use and regular use by non-root users is explained in section 2.2.3 of the
Administrator Manual.
• The NVIDIA HPC SDK C, C++, and Fortran compilers that support GPU acceleration of HPC
modeling and simulation applications with standard C++ and Fortran, OpenACC directives, and
CUDA.
• GPU-accelerated math libraries that maximize performance on common HPC algorithms, and op-
timized communications libraries enable standards-based multi-GPU and scalable systems pro-
gramming.
• Performance profiling and debugging tools that simplify porting and optimization of HPC appli-
cations
• Support for ARM, OpenPOWER, x86-64 CPUs, as well as NVIDIA GPUs, running Linux
Example
The preceding output was what was available at the time of writing (April 2023). The output can be
expected to change.
A browser-based way to check the cm-nvhpc versions and CUDA availability situation for Bright ver-
sions, distributions and architecture is to use cm-nvhpc as a string in the Bright Computing distributed
packages list at https://support.brightcomputing.com/packages-dashboard
Compiler Modules
The cm-nvhpc package makes several environment modules available for compiling:
The nvhpc environment module is the standard HPC SDK, and provides an OPENMPI 3.x library by
default.
The byo tag is an abbreviation for ’bring-your-own’, and means that the general compiler environ-
ment for C, C++ and Fortran are not set.
The nompi tag implies that paths to the MPI binaries and MPI libraries that come with cm-nvhpc are
not set, so that no MPI library is used from the package. An external MPI library can then be used with
the nvhpc-nompi compiler.
The nvhpc-hpcx environment module sets up the HPC-X library environment. This is an alternative
to the default OpenMPI 3.x library that the nvhpc module provides.
Viewing Installed Available CUDA Versions, And The Running CUDA Version
The installed available CUDA versions for nvhpc can be viewed with:
Example
Example
• nvhpc cluster-wide:
• nvhpc for a specific user on a specific head or compute node, as specified by hostname -s:
The second configuration file overwrites any settings set with ${NVHPC_ROOT}/compilers/bin/localrc
If the ${NVHPC_ROOT}/compilers/bin/localrc.$(hostname -s) configuration file exists, then a
${HOME}/localrc.$(hostname -s) is ignored.
CUDA packages that the cluster administrator manages: At the time of writing, the packages that the
cluster administrator can install or remove for the cluster manager are:
7.4 CUDA For GPUs 75
cuda10.2-visual-tools∗
cuda11.0-visual-tools†
cuda11.1-visual-tools†
cuda11.2-visual-tools†
cuda11.3-visual-tools† shared CUDA visual toolkit
cuda11.4-visual-tools†
cuda11.5-visual-tools†
cuda11.6-visual-tools†
cuda11.7-visual-tools†
cuda10.1-sdk∗
cuda10.2-sdk∗
cuda11.0-sdk†
cuda11.1-sdk†
cuda11.2-sdk†
shared CUDA software development kit
cuda11.3-sdk†
cuda11.4-sdk†
cuda11.5-sdk†
cuda11.6-sdk†
cuda11.7-sdk†
The packages of type shared in the preceding table should be installed on the head nodes of a cluster
using CUDA-compatible GPUs. The packages of type local should be installed to all nodes that access
the GPUs. In most cases this means that the cuda-driver and cuda-dcgm packages should be installed
in a software image (section 2.1.2 of the Administrator Manual).
If a head node also accesses GPUs, then the cuda-driver and cuda-dcgm packages should be in-
stalled on it, too.
For packages of type shared, the particular CUDA version that is run on the node can be selected
via a modules environment command:
Example
CUDA packages that the cluster administrator normally does not manage: As an aside, there are
also the CUDA DCGM packages:
The preceding DCGM packages are installed in the cluster manager, because CMDaemon uses
them to manage NVidia Tesla GPUs. Tesla drivers normally work for the latest CUDA version, and
may not therefore not (yet) support the latest GeForce GPUs.
CUDA package that the cluster administrator may wish to install for CUDA programming: CUB
is a CUDA programming library that developers may wish to access. It is provided by the package
cm-cub-cuda, from the Machine Learning (cm-ml) repository.
• The NVIDIA GPU hardware should be detectable by the kernel, otherwise the GPUs cannot be
used by the drivers. Running the lspci command on the device with the GPU before the CUDA
package driver installation is a quick check that should make it clear if the NVIDIA hardware is
detected in the first place:
Example
If the hardware is not detected by the kernel already, then the administrator should reassess the
situation.
• Only after CUDA package installation has taken place, and after rebooting the node with the GPU,
are GPU details visible using the sysinfo command:
Example
7.4 CUDA For GPUs 77
running sysinfo on node001, which is where the GPU is, via cmsh on the head node, while cuda-dcgm is
not yet ready:
Example
running sysinfo on node001, which is where the GPU is, via cmsh on the head node, after cuda-dcgm is
ready:
• CUDA compilation should take place on a node that uses NVidia GPUs during compilation.
• Cross compilation of CUDA software is generally not a best practice due to resource consumption,
which can even lead to crashes.
– If, despite this, cross compilation with a CPU is done, then the cuda-driver package should
be installed on the node on which the compilation is done, and the GPU-related services on
the node, such as:
* cuda-driver.service
* nvidia-persistenced.service
* cuda-dcgm.service
should be disabled.
shows that one of the dependencies of the cuda-driver package is the freeglut-devel package, so
it should be installed on a node that accesses a GPU. If the CUDA SDK source is to be compiled on the
head node (with the head node not accessing a GPU, and with the cuda-driver package not installed)
then the freeglut, freeglut-devel, and libXi-devel packages should be installed on the head node.
The cuda-driver package is used to compile the kernel drivers which manage the GPU. Therefore,
when installing cuda-driver with yum, several other X11-related packages are installed too, due to pack-
age dependencies.
The cuda*-sdk packages can be used to compile libraries and tools that are not part of the CUDA
toolkit, but used by CUDA software developers, such as the deviceQuery binary (section 7.4.3).
The cuda-xorg package is optional, and contains the driver and libraries for an X server
Example
For example, on a cluster where (some of) the nodes access GPUs, but the head node does not access a
GPU, the following commands can be issued on the head node to install the CUDA 11.2 packages using
YUM:
[root@mycluster ~]# yum install cuda11.2-toolkit cuda11.2-sdk
[root@mycluster ~]# yum --installroot=/cm/images/default-image install cuda-driver cuda-dcgm
The --installroot command installs to the image used by the nodes. Here the image used by the
nodes is assumed to be default-image. To ensure the software is installed from the image to the nodes,
the imageupdate command can be run from within cmsh for the appropriate nodes.
Versions of Red Hat 7 and beyond, and derived versions, as well as versions of SLES version 12
and beyond, use systemd instead of an init-based system. For these the equivalent starting command
command is:
7.4 CUDA For GPUs 79
Example
If there is a failure in compiling the CUDA module, it is usually indicated by a message saying
“Could not make module”, “NVRM: API mismatch:”, or “Cannot determine kernel version”. Such
a failure typically occurs because compilation is not possible due to missing the correct kernel develop-
ment package from the distribution. Section 7.4.2 explains how to check for, and install, the appropriate
missing package.
Example
make clean
Executing: /tmp/cuda11.2/bin/x86_64/linux/release/alignedTypes
[/tmp/cuda10.2/bin/x86_64/linux/release/alignedTypes] - Starting...
GPU Device 0: "Volta" with compute capability 7.0
Another method to verify that CUDA is working, is to build and use the deviceQuery command on
a node accessing one or more GPUs. The deviceQuery command lists all CUDA-capable GPUs that a
device can access, along with several of their properties (some output elided):
Example
The CUDA user manual has further information on how to run compute jobs using CUDA.
Further information on CUDA verification: More on verification can be found in the NVIDIA CUDA
INSTALLATION GUIDE FOR LINUX at https://docs.nvidia.com/cuda/pdf/CUDA_Installation_Guide_
Linux.pdf.
7.4 CUDA For GPUs 81
Example
make clean
make (may take a while)
Run all tests? (y/N)? y
Executing: /tmp/opencl/OpenCL/bin/linux/release/oclBandwidthTest
[oclBandwidthTest] starting...
/tmp/opencl/OpenCL/bin/linux/release/oclBandwidthTest Starting...
...
The following dynamic module loading line may need to be added to the Module section of the X con-
figuration:
Load "glx"
The following graphics device description lines need to be replaced in the Device section of the X con-
figuration:
Driver "nvidia"
The BusID line may need to be replaced with the ID shown for the GPU by the lspci command.
Example
Section "ServerLayout"
Identifier "Default Layout"
Screen 0 "Screen0" 0 0
82 Third Party Software
Section "Files"
ModulePath "/usr/lib64/xorg/modules/extensions/nvidia"
ModulePath "/usr/lib64/xorg/modules/extensions"
ModulePath "/usr/lib64/xorg/modules"
EndSection
Section "Module"
Load "glx"
EndSection
Section "InputDevice"
Identifier "Keyboard0"
Driver "kbd"
Option "XkbModel" "pc105"
Option "XkbLayout" "us"
EndSection
Section "Device"
Identifier "Videocard0"
Driver "nvidia"
BusID "PCI:14:0:0"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Videocard0"
DefaultDepth 24
SubSection "Display"
Viewport 0 0
Depth 24
EndSubSection
EndSection
Example
Example
After it is installed, the node on which the installation is done must be rebooted.
Running the diagnostic after the reboot should display output similar to:
Example
Example
root@bright92:~# cmsh
[bright92]% softwareimage
[bright92->softwareimage]% clone default-image am
[bright92->softwareimage*[am*]]% commit
[bright92->softwareimage[am]]%
[notice] bright92: Started to copy: /cm/images/default-image -> /cm/images/am (4117)
84 Third Party Software
...
[notice] bright92: Initial ramdisk for image am was generated successfully
[bright92->softwareimage[am]]% quit
To install the packages, the instructions from AMD should be followed. These instructions are at
https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1/page/How_to_Install_ROCm.html
at the time of writing (March 2022).
The installation must be done in the image, which for a RHEL image uses a chroot into the image,
and uses a bind mount to have some special filesystem directories (/proc, /sys, and similar) be available
during the package installation. This is needed for the DKMS installation.
Bind mounting the filesystems and then chrooting is a little tedious, so the cm-chroot-sw-img utility
(page 518 of the Administrator Manual) is used to automate the job.
The following session output illustrates the procedure for Rocky 8, with much text elided:
mounted /cm/images/default-image/dev
mounted /cm/images/default-image/dev/pts
mounted /cm/images/default-image/proc
mounted /cm/images/default-image/sys
mounted /cm/images/default-image/run
...
The amdgpu-install package can then be installed, which installs the ROCm stack with it. After
installation, exiting from the chroot automatically unmounts the bind mounts:
[root@bright92:/]# urldomain=https://repo.radeon.com
[root@bright92:/]# urlpath=/amdgpu-install/22.10/rhel/8.5/amdgpu-install-22.10.50100-1.el8.noarch.rpm
[root@bright92:/]# yum install $urldomain$urlpath
...
[root@bright92:/]# amdgpu-install --usecase=rocm
...
[root@bright92:/]# exit
umounted /cm/images/am/dev/pts
umounted /cm/images/am/dev
umounted /cm/images/am/proc
umounted /cm/images/am/sys
umounted /cm/images/am/run
The nodes that are to use the driver should then be set to use the new image, and should be rebooted:
Example
root@bright92 ~# cmsh
[bright92]% device use node001
[bright92->device[node001]]% set softwareimage am
[bright92->device*[node001*]]% commit
[bright92->device[node001]]% reboot node001
Normal nodes without the AMD GPU also boot up without crashing if they are set to use this image,
but will not be able to run OpenCL programs.
root@bright92:~# cmsh
[bright92]% softwareimage
[bright92->softwareimage]% clone default-image am
[bright92->softwareimage*[am*]]% commit
[bright92->softwareimage[am]]%
[notice] bright92: Started to copy: /cm/images/default-image -> /cm/images/am (117)
...
[notice] bright92: Initial ramdisk for image am was generated successfully
[bright92->softwareimage[am]]% quit
To install the packages, the instructions from AMD should be followed. These instructions describe
configuring access to the AMD driver repository, before picking up the driver. The instructions are at
https://docs.amd.com/en/latest/deploy/linux/quick_start.html
at the time of writing (October 2023).
The configuration must be done in the image. For an Ubuntu image a chroot can be done into the
image with the help of the cm-chroot-sw-img utility (page 518 of the Administrator Manual). This uses
a bind mount to have the /proc, /sys, and other special directories be available during the package
installation (section 11.4 of the Administrator Manual).
The following session output illustrates the driver installation procedure, with much text elided,
The am image directory is entered with the chroot utility
Example
and the instructions on configuring access to the AMD driver repository are followed.
The AMD GPU installer package can be picked up from under https://repo.radeon.com/
amdgpu-install/. There are several installer versions available. Using the most recent one is usually
best.
The first part of a URL to the package can be defined as:
Example
root@bright92:~# URLubuntu=https://repo.radeon.com/amdgpu-install/22.10/ubuntu
The second part of the URL to the package can be defined according to the Ubuntu version used,
and according to what is available. The package can then be retrieved, for example:
root@bright92:~# URLbionic=/bionic/amdgpu-install_22.10.50100-1_all.deb
root@bright92:~# wget $URLubuntu$URLbionic
or
root@bright92:~# URLfocal=/focal/amdgpu-install_22.10.50100-1_all.deb
root@bright92:~# wget $URLubuntu$URLfocal
The nodes that are to use the driver should then be set to use the new image, and should be rebooted:
Normal nodes without an AMD GPU also boot up without crashing if they are set to use this image,
but they are not be able to run OpenCL programs.
1. The procedure begins with cloning the default-image to an image that is to be the AMD GPU
image, such as, for example, am.
root@bright92:~# cmsh
[bright92]% softwareimage
[bright92->softwareimage]% clone default-image am
[bright92->softwareimage*[am*]]% commit
[bright92->softwareimage[am]]%
[notice] bright92: Started to copy: /cm/images/default-image -> /cm/images/am (117)
...
[notice] bright92: Initial ramdisk for image am was generated successfully
[bright92->softwareimage[am]]% quit
• install DKMS
bright92:/ # zypper install dkms
bright92:/ # zypper clean --all
• add the Perl dependency repository
bright92:/ # domainURL=https://download.opensuse.org
bright92:/ # perlSLESpath=/repositories/devel:languages:perl/SLE_15/devel:languages:perl.repo
bright92:/ # zypper addrepo $domainURL$perlSLESpath
• install the AMD GPU install tool
bright92:/ # URLradeon=https://repo.radeon.com
bright92:/ # slepath=/amdgpu-install/22.10/sle/15/amdgpu-install-22.10.50100-1.noarch.rpm
bright92:/ # zypper install $URLradeon$slepath
• and to install the ROCm driver and software:
bright92:/ # amdgpu-install --usecase=rocm
bright92:/ # exit
umounted /cm/images/am/dev/pts
umounted /cm/images/am/dev
umounted /cm/images/am/proc
umounted /cm/images/am/sys
umounted /cm/images/am/run
7.6 OFED Software Stack 87
3. The nodes that are to use the driver should then be set to use the new image, and should be
rebooted:
Example
root@bright92 ~# cmsh
[bright92]% device use node001
[bright92->device[node001]]% set softwareimage am
[bright92->device*[node001*]]% commit
[bright92->device[node001]]% reboot node001
Normal nodes without the AMD GPU also boot up without crashing if they are set to use this
image, but they will not be able to run OpenCL programs.
This is due to an AMD GPU driver installation bug, where the library, which is placed in a directory
of the form /opt/rocm-*/lib, is not linked up during installation.
A workaround is to set up the link manually. This is done in the chroot environment, in the relevant
image, by creating a .conf file under /etc/ld.so.conf.d with the path to the library.
In the following example, the path is configured for an Ubuntu 20.04 image:
Example
After the configuration file has been placed, the ldconfig command is run, still within chroot, to link
the library in the image(s).
7.6.1 Choosing A Distribution Version, Or A Vendor Version, Ensuring The Kernel Matches,
And Logging The Installation
By default, the Linux distribution OFED packages are matched to the distribution kernel version and
installed on the cluster. This is the safest option in most cases, and also allows NFS over RDMA.
The cluster manager also packages NVIDIA’s Mellanox OFED software. Such vendor OFED pack-
ages can be more recent than the distribution packages, which means that they can provide support for
more recent hardware and firmware, as well as more features.
For the vendor OFED packages to work, the OFED firmware as provided by the manufacturer should
in general be recent, to ensure software driver compatibility.
The cluster manager vendor OFED packages can be selected and installed during the initial cluster
installation (figure 3.23), replacing the default distribution OFED stack. The stack can also be installed
later on, after the cluster is set up.
88 Third Party Software
If there is no prebuilt OFED kernel modules package available for the kernel in use, then using a
supported kernel is recommended.
When updating kernels on the head or the regular nodes, the updated cluster manager OFED soft-
ware stack must be reinstalled.
If the cluster manager OFED software stack is installed during the cluster installation procedure
itself (section 3.3.17), then some basic information is logged to /var/log/cmfirstboot.log, which is
the general first boot log.
If the cluster manager OFED software stack is not installed during the cluster installation procedure
itself, then it can be installed later when the cluster is up and running.
A successful installation of the cluster manager OFED software stack (section 7.6.2) onto a running
cluster consists of the cluster manager OFED package installation, as well as then running an instal-
lation script. The vendor and version number installed can then be found in /etc/cm-ofed. Further
installation details can be found in /var/log/cm-ofed.log.
7.6.2 Mellanox OFED Stack Installation Using The Bright Computing Repository
Package names: mlnx-ofed46, mlnx-ofed47, mlnx-ofed49, mlnx-ofed50, mlnx-ofed51, mlnx-ofed52,
mlnx-ofed53, mlnx-ofed54, mlnx-ofed55, mlnx-ofed56, mlnx-ofed57
The Mellanox stacks are installed and configured by the cluster manager in an identical way as far
as the administrator is concerned. In this section (section 7.6.2):
<vendor-ofedVersion>
is used to indicate where the administrator must carry out a substitution. For Mellanox, the substitution
is one of the following:
These stacks are currently supported by the NVIDIA Bright Cluster Manager 9.2-supported distri-
butions (RHEL and derivatives, SLES, and Ubuntu), as determined by the compatibility matrices in the
downloads pages accessible from https://network.nvidia.com/support/mlnx-ofed-matrix/.
The mlnx-ofed49 stack is a LTS release, aimed mainly at supporting older hardware. The stack may
be useful for one of the following:
• ConnectX-3 cards
• Connect-IB cards
For other use cases it usually makes sense to get the most recent supported stack.
Each stack version needs to be matched to a firmware version associated with the OFED device
used. OFED devices used, ConnectX, BlueField, and others, must also be matched along with their
firmware version via the downloads pages accessible from the URL https://network.nvidia.com/
support/mlnx-ofed-matrix/. Deviating from compatible versions is not supported.
Details on the compatibility of older stack versions can also be found via that URL.
Returning back to the subject of OFED package installation via the package manager: For example,
a yum install command indicated by:
yum install <vendor-ofedVersion>
means that the installation of the Bright Computing OFED package is executed with one of these corre-
sponding yum install commands, for example:
yum install mlnx-ofed49
Installing The OFED Stack Provided By The Bright Computing Repository Vendor Package
Running the package manager command associated with the distribution (yum install, zypper up,
apt install), unpacks and installs or updates several packages and scripts. For example:
yum install <vendor-ofedVersion>
However, it does not carry out the installation and configuration of the driver itself due to the funda-
mental nature of the changes it would carry out. The script:
<vendor-ofedVersion>-install.sh
can be used after the package manager installation to carry out the installation and configuration of the
driver itself. Running it without options displays a usage help text.
Head and software image installation with prebuilt drivers: The script can be run on the nodes as
follows:
• On the head node, the default distribution OFED software stack can be replaced with the vendor
OFED software stack made available from the Bright Computing repository, by using the script’s
head option, -h:
[root@bright92~]# /cm/local/apps/<vendor-ofedVersion>/current/bin/<vendor-ofedVersion>-ins\
tall.sh -h
A reboot is recommended after the script completes the install, to help ensure the new image is
cleanly used by the head node.
• For a software image, for example default-image, used by the regular nodes, the default distri-
bution OFED software stack can be replaced with the vendor OFED software stack made available
from the Bright Computing repository, by using the script’s software image option, -s:
[root@bright92~]# /cm/local/apps/<vendor-ofedVersion>/current/bin/<vendor-ofedVersion>-ins\
tall.sh -s default-image
Upgrading Kernels When The OFED Stack Has Been Provided By The Bright Computing Repository
Vendor Package—Reinstallation Of The OFED Stack
For all distributions, as explained in the preceding text, a vendor OFED stack is installed and config-
ured via the script <vendor-ofedVersion>-install.sh. OFED reinstallation may be needed if the kernel is
upgraded.
In Ubuntu: if the OFED stack is installed from the distribution or vendor OFED .deb packages, then
the DKMS (Dynamic Kernel Module System) framework makes upgraded vendor OFED kernel mod-
ules available at a higher preference than the distribution OFED kernel modules for a standard distri-
bution kernel. If there is a kernel upgrade that causes unexpected behavior from the vendor OFED
package, then the cluster administrator can still configure the distribution OFED for use by setting the
distribution OFED kernel module as the preferred kernel module. So no kernel-related packages need
to be excluded from vendor OFED upgrades or kernel upgrades. Typically, Ubuntu clusters can have
a package update (apt upgrade) carried out, with no explicit changes needed to take care of the OFED
stack.
For RHEL and derivatives, and SLES: if the OFED stack is installed from the vendor OFED RPM
packages, then the script customizes the vendor OFED stack for the existing kernel, and replaces the
distribution stack. However, updating the kernel afterwards, without updating the stack along with it,
could lead to unexpected behavior due to the customization. Kernel and kernel development updates
are therefore prevented from taking place by a package management system block. Updating the kernel,
kernel development, and OFED stack in such a configuration therefore requires that the administrator
manually overrides the block so that the OFED stack can be handled with consideration.
The following procedure can thus be followed to update and install the kernel packages and OFED
stack:
• In Red Hat-based systems, The /etc/yum.conf file must be edited. In that file, in the line that
starts with exclude, the kernel and kernel-devel packages need to be removed, so that they
are no longer excluded from updates.
• In SUSE, the kernel-default and kernel-default-devel packages must be unlocked. The
command:
zypper removelock kernel-default kernel-default-devel
• yum update—or for SUSE zypper up—updates the packages on the head node.
• To update the packages on the regular nodes the procedure outlined in section 11.3.3 of the
Administrator Manual is followed:
– The packages on the regular node image (for example, default-image) are updated ac-
cording to distribution:
* in Red Hat-based systems as follows:
yum --installroot=/cm/images/default-image update
* or in SLES as follows:
zypper --root=/cm/images/default-image up
* or in Ubuntu as follows, using the cm-chroot-sw-img tool (page 518 of the Adminis-
trator Manual):
7.7 Intel OPA Software Stack 91
– The kernelversion setting for the regular node image, which in this example is the de-
fault default-image, can be updated as follows:
Example
[root@bright92 ~]# cmsh
[bright92]% softwareimage
[bright92->softwareimage]% use default-image
[bright92->softwareimage[default-image]]% set kernelversion 3.10.0-327.3.1.el7.x86_64
[bright92->softwareimage[default-image*]]% commit
This ensures that the updated kernel is used after reboot. Tab-completion in the set
kernelversion line prompts for the right kernel from available options.
3. A reboot of the head and regular nodes installs the new kernel.
4. Configuring and installing the vendor OFED stack driver for the new kernel is done by running
the script <vendor-ofedVersion>-install.sh as before, as follows:
• For a stack that is on the head node, the compilation should be done together with the -h
option:
[root@bright92~]# /cm/local/apps/<vendor-ofedVersion>/current/bin/<vendor-ofedVersion>-ins\
tall.sh -h
• For a software image used by the regular nodes, for example default-image, the compilation
should be done together with the -s option:
[root@bright92~]# /cm/local/apps/<vendor-ofedVersion>/current/bin/<vendor-ofedVersion>-ins\
tall.sh -s default-image
These configuration and installation steps for the vendor OFED driver are typically not needed for
Ubuntu.
7.7.1 Installation
After the head node has been installed, the Intel OPA Software Stack can be installed by executing the
following commands on the head node:
[root@bright92 ~]# yum install intel-opa
The yum command installs the package containing the OPA stack itself, as well as the installation
scripts required for installing and configuring the kernel drivers. These are automatically placed under
a subdirectory named after the OPA stack version.
92 Third Party Software
For the software images, the OPA stack can be configured and deployed for each software image as
follows:
/cm/local/apps/intel-opa/<version>/bin/intel-opa-install.sh -s <name of software image>
The OPA MTU size is not changed by the cluster manager during installation. An Intel recommen-
dation at the time of writing (October 2017) in https://www.intel.com/content/dam/support/us/
en/documents/network-and-i-o/fabric-products/Intel_OP_Performance_Tuning_UG_H93143_v3_
0.pdf, in section 6.2 is:
OPA on the other hand can support MTU sizes from 2048B (2K) up to 8192B (8KB) for verbs or PSM 2 traffic.
Intel recommends you use the 8KB MTU default for RDMA requests of 8KB or more.
8
Burning Nodes
The burn framework is a component of NVIDIA Bright Cluster Manager 9.2 that can automatically run
test scripts on specified nodes within a cluster. The framework is designed to stress test newly built
machines and to detect components that may fail under load. Nodes undergoing a burn session with
the default burn configuration, lose their filesystem and partition data for all attached drives, and revert
to their software image on provisioning after a reboot.
Example
<?xml version="1.0"?>
<burnconfig>
<mail>
<address>root@master</address>
<address>some@other.address</address>
</mail>
<pre-install>
<phase name="01-hwinfo">
<test name="hwinfo"/>
<test name="hwdiff"/>
94 Burning Nodes
<phase name="02-disks">
<test name="disktest" args="30"/>
<test name="mce_check" endless="1"/>
</phase>
</pre-install>
<post-install>
<phase name="03-hpl">
<test name="hpl"/>
<test name="mce_check" endless="1"/>
</phase>
<phase name="04-compile">
<test name="compile" args="6"/>
<test name="mce_check" endless="1"/>
</phase>
</post-install>
</burnconfig>
8.2.4 Phases
The phases sections must exist. If there is no content for the phases, the phases tags must still be in place
(“must exist”). Each phase must have a unique name and must be written in the burn configuration file
in alphanumerical order. By default, numbers are used as prefixes. The phases are executed in sequence.
8.2.5 Tests
Each phase consists of one or more test tags. The tests can optionally be passed arguments using the
args property of the burn configuration file (section 8.2). If multiple arguments are required, they should
be a space separated list, with the (single) list being the args property.
Tests in the same phase are run simultaneously.
8.3 Running A Burn Configuration 95
Most tests test something and then end. For example, the disk test tests the performance of all drives
and then quits.
Tests which are designed to end automatically are known as non-endless tests.
Tests designed to monitor continuously are known as endless tests. Endless tests are not really endless.
They end once all the non-endless tests in the same phase are ended, thus bringing an end to the phase.
Endless tests typically test for errors caused by the load induced by the non-endless tests. For example
the mce_check test continuously keeps an eye out for Machine Check Exceptions while the non-endless
tests in the same phase are run.
A special test is the final test, memtest86, which is part of the default burn run, as configured in the
XML configuration default-destructive. It does run endlessly if left to run. To end it, the adminis-
trator can deal with its output at the node console or can power reset the node. It is usually convenient
to remove memtest86 from the default XML configuration in larger clusters, and to rely on the HPL and
memtester tests instead, for uncovering memory hardware errors.
Example
The values of a particular burn configuration (default-destructive in the following example) can
be viewed as follows:
Example
The set command can be used to modify existing values of the burn configuration, that is:
Description, Name, and XML. XML is the burn configuration file itself. The get xml command can be
used to view the file, while using set xml opens up the default text editor, thus allowing the burn
configuration to be modified.
A new burn configuration can also be added with the add command. The new burn configuration
can be created from scratch with the set command. However, an XML file can also be imported to the
new burn configuration by specifying the full path of the XML file to be imported:
Example
96 Burning Nodes
The burn configuration can also be edited when carrying out burn execution with the burn com-
mand.
Executing A Burn
A burn as specified by the burn configuration file can be executed in cmsh using the burn command of
device mode.
Burn commands: The burn commands can modify these properties, as well as execute other burn-
related operations.
The burn commands are executed within device mode, and are:
• burn start
• burn stop
• burn status
• burn log
The burn help text that follows lists the detailed options. Next, operations with the burn commands
illustrate how the options may be used along with some features.
[head1->device[node005]]% burn
Name: burn - Node burn control
Include all nodes that have the given image, e.g default-image or
default-image,gpu-image
-i, --intersection
Calculate the intersection of the above selections
-u, --union
Calculate the union of the above selections
--config <name>
Burn with the specified burn configuration. See in partition burn configurations
for a list of valid names
--file <path>
Burn with the specified file instead of burn configuration
--later
Do not reboot nodes now, wait until manual reboot
--edit
Open editor for last minute changes
--no-drain
Do not drain the node from WLM before starting to burn
--no-undrain
Do not undrain the node from WLM after burn is complete
-p, --path
Show path to the burn log files. Of the form: /var/spool/burn/<mac>.
-v, --verbose
Show verbose output (only for burn status)
--sort <field1>[,<field2>,...]
Override default sort order (only for burn status)
Examples:
burn --config default-destructive start -n node001
Burn command operations: Burn commands allow the following operations, and have the following
features:
• start, stop, status, log: The basic burn operations allow a burn to be started or stopped, and the
status of a burn to be viewed and logged.
– The “burn start” command always needs a configuration file name. In the following it is
boxburn. The command also always needs to be given the nodes it operates on:
[bright92->device]% burn --config boxburn -n node007 start
Power reset nodes
[bright92->device]%
ipmi0 .................... [ RESET ] node007
Fri Nov 3 ... [notice] bright92: node007 [ DOWN ]
[bright92->device]%
Fri Nov 3 ... [notice] bright92: node007 [ INSTALLING ] (node installer started)
[bright92->device]%
Fri Nov 3 ... [notice] bright92: node007 [ INSTALLING ] (running burn in tests)
...
– The “burn stop” command only needs to be given the nodes it operates on, for example:
[bright92->device]% burn -n node007 stop
each line of output is quite long, so each line has been rendered truncated and ellipsized.
The ellipsis marks in the 5 preceding output lines align with the lines that follow.
That is, the lines that follow are the endings of the preceding 5 lines:
...Warnings Tests
...--------- --------------------------------------------------------------
...0
...0
...0 /var/spool/burn/c8-1f-66-f2-61-c0/02-disks/disktest (S,171),\
/var/spool/burn/c8-1f-66-f2-61-c0/02-disks/kmon (S),\
/var/spool/bu+
– The “burn log” command displays the burn log for specified node groupings. Each node
with a boot MAC address of <mac> has an associated burn log file.
Burn command output examples: The burn status command has a compact one-line output per
node:
Example
burn test kernel log monitor kmon (SP) Started and Passed
Letter Meaning
S started
W warning
F failed
P passed
The “burn log” command output looks like the following (some output elided):
The output of the burn log command is actually the messages file in the burn directory, for the node
associated with a MAC-address directory <mac>. The burn directory is at /var/spool/burn/ and the
messages file is thus located at:
/var/spool/burn/<mac>/messages
The tests have their log files in their own directories under the MAC-address directory, using their
phase name. For example, the pre-install section has a phase named 01-hwinfo. The output logs of this
test are then stored under:
/var/spool/burn/<mac>/01-hwinfo/
Non-endless Tests
The following example test script is not a working test script, but can be used as a template for a non-
endless test:
Example
#!/bin/bash
# We need to know our own test name, amongst other things for logging.
me=`basename $0`
# Inside the spool directory a sub-directory with the same name as the
# test is also created. This directory ($spooldir/$me) should be used
# for any output files etc. Note that the script should possibly remove
# any previous output files before starting.
spooldir=$1
# In case a test detects trouble but does not want the entire burn to be
# halted $warningfile _and_ $passedfile should be created. Any warnings
# should be written to this file.
warningfile=$spooldir/$me.warning
# Some short status info can be written to this file. For instance, the
# stresscpu test outputs something like 13/60 to this file to indicate
# time remaining.
# Keep the content on one line and as short as possible!
statusfile=$spooldir/$me.status
# Some scripts may require some cleanup. For instance a test might fail
# and be
# restarted after hardware fixes.
rm -f $spooldir/$me/*.out &>/dev/null
# Send a message to the burn log file, syslog and the screen.
# Always prefix with $me!
blog "$me: starting, option1 = $option1 option2 = $option2"
Endless Tests
The following example test script is not a working test, but can be used as a template for an endless test.
Example
#!/bin/bash
# We need to know our own test name, amongst other things for logging.
me=`basename $0`
# In case a test detects trouble but does not want the entire burn to be
# halted $warningfile _and_ $passedfile should be created. Any warnings
# should be written to this file.
warningfile=$spooldir/$me.warning
# Some short status info can be written to this file. For instance, the
# stresscpu test outputs something like 13/60 to this file to indicate
# time remaining.
# Keep the content on one line and as short as possible!
statusfile=$spooldir/$me.status
else
blog "$me: starting test, checking every minute"
# Some scripts may require some cleanup. For instance a test might fail
# and be restarted after hardware fixes.
rm -f $spooldir/$me/*.out &>/dev/null
while [ -e "$spooldir/$me/running" ]
do
run-some-check
if [ was_a_problem ]; then
blog "$me: WARNING, something unexpected happened."
echo "some warning" >> $warningfile # note the append!
elif [ failure ]; then
blog "$me: Aiii, we're all gonna die! my-test FAILED!"
echo "Failure message." > $failedfile
fi
sleep 60
done
fi
Example
Here, burn-control, which is the parent of the disk testing process, keeps track of the tests that pass
and fail. On failure of a test, burn-control terminates all tests.
The node that has failed then requires intervention from the administrator in order to change state.
The node does not restart by default. The administrator should be aware that the state reported by the
node to CMDaemon remains burning at this point, even though it is not actually doing anything.
To change the state, the burn must be stopped with the burn stop command in cmsh. If the node is
restarted without explicitly stopping the burn, then it simply retries the phase at which it failed.
Under the burn log directory, the log of the particular test that failed for a particular node can some-
times suggest a reason for the failure. For retries, old logs are not overwritten, but moved to a directory
with the same name, and a number appended indicating the try number. Thus:
Example
1. The BurnSpoolDir setting can be set in the CMDaemon configuration file on the head node, at
/cm/local/apps/cmd/etc/cmd.conf. The BurnSpoolDir setting tells CMDaemon where to look
for burn data when the burn status is requested through cmsh.
• BurnSpoolDir="/var/spool/burn"
CMDaemon should be restarted after the configuration has been set. This can be done with:
2. The burnSpoolHost setting, which matches the host, and burnSpoolPath setting,
which matches the location, can be changed in the node-installer configuration file
on the head node, at /cm/node-installer/scripts/node-installer.conf (for multi-
arch/multidistro configurations the path takes the form: /cm/node-installer-<distribution>-
<architecture>/scripts/node-installer.conf). These have the following values by default:
• burnSpoolHost = master
106 Burning Nodes
• burnSpoolPath = /var/spool/burn
3. Part 3 of the procedure adds a new location to export the burn log. This is only relevant if the
spool directory is being relocated within the head node. If the spool is on an external fileserver,
the existing burn log export may as well be removed.
The new location can be added to the head node as a path value, from a writable filesystem export
name. The writable filesystem export name can most easily be added using Bright View, via the
clickpath:
Devices→Head Nodes→Edit→Settings→Filesystem exports→Add
Adding a new name like this is recommended, instead of just modifying the path value in an
existing Filesystem exports name. This is because changing things back if the configuration is
done incorrectly is then easy. By default, the existing Filesystem exports for the burn directory
has the name:
• /var/spool/burn@internalnet
• /var/spool/burn
When the new name is set in Filesystem exports, the associated path value can be set in agree-
ment with the values set earlier in parts 1 and 2.
If using cmsh instead of Bright View, then the change can be carried out from within the fsexports
submode. Section 3.10.1 of the Administrator Manual gives more detail on similar examples of how
to add such filesystem exports.
Example
<burnconfig>
<pre-install>
<phase name="01-hwinfo">
<test name="hwinfo"/>
<test name="sleep" args="10"/>
</phase>
</pre-install>
<post-install>
<phase name="02-mprime">
<test name="mprime" args="2"/>
<test name="mce_check" endless="1"/>
<test name="kmon" endless="1"/>
</phase>
</post-install>
</burnconfig>
8.4 Relocating The Burn Logs 107
To burn a single node with this configuration, the following could be run from the device mode of
cmsh:
Example
This makes an editor pop up containing the default burn configuration. The content can be replaced
with the short burn configuration. Saving and quitting the editor causes the node to power cycle and
start its burn.
The example burn configuration typically completes in less then 10 minutes or so, depending mostly
on how fast the node can be provisioned. It runs the mprime test for about two minutes.
9
Installing And Configuring
SELinux
9.1 Introduction
Security-Enhanced Linux (SELinux) can be enabled on selected nodes. If SELinux is enabled on a stan-
dard Linux operating system, then it is typically initialized in the kernel when booting from a hard
drive. However, in the case of nodes provisioned by NVIDIA Bright Cluster Manager, via PXE boot, the
SELinux initialization occurs at the very end of the node installer phase.
SELinux is disabled by default because its security policies are typically customized to the needs
of the organization using it. The administrator must therefore decide on appropriate access control
security policies. When creating such custom policies, special care should be taken that the cmd process
is executed in, ideally, an unconfined context.
Before enabling SELinux on a cluster, the administrator is advised to first check that the Linux distri-
bution used offers enterprise support for SELinux-enabled systems. This is because support for SELinux
should be provided by the distribution in case of issues.
Enabling SELinux is only advised for the cluster manager if the internal security policies of the or-
ganization absolutely require it. This is because it requires custom changes from the administrator. If
something is not working right, then the effect of these custom changes on the installation must also be
taken into consideration, which can sometimes be difficult.
SELinux is partially managed by the cluster manager and can run on the head and regular nodes.
The SELinux settings managed by CMDaemon (via cmsh or Bright View) should not be managed by
directly dealing with the node outside of CMDaemon management, as that can lead to an inconsistent
knowledge of the SELinux settings by CMDaemon.
When first configuring SELinux to run with the cluster manager on regular nodes, the nodes should
be configured with permissive mode to ensure that the nodes work with applications. Troubleshooting
permissive mode so that enforcing mode can be enabled is outside the scope of Bright support, unless
the issue is demonstrably a cluster manager-related issue.
Example
-------------------------------- ------------------------------------------------
Initialize yes
Revision
Reboot after context restore no
Allow NFS home directories yes
Context action auto install always
Context action full install always
Context action nosync install always
Mode permissive
Policy targeted
Key value settings <submode>
The Mode can be set to permissive, enforcing or disabled. When starting the use of SELinux and
establishing policies, it should be set to permissive to begin with, so that troubleshooting issues to do
with running applications with enforcing mode can be examined.
The default SELinux configuration parameters are in /cm/node-installer/scripts/
node-installer.conf, and that file remains unchanged by cmsh settings changes. The values of
SELinux configuration parameters used from that file are however overridden by the corresponding
cmsh settings.
For multiarch/multidistro configurations the node-installer path in the preceding session takes
the form: /cm/node-installer-<distribution>-<architecture>/scripts/node-installer.conf. The val-
ues for <distribution> and <architecture> can take the values outlined on (page 535 of the Administrator
Manual).
Example
The SELinux settings can then be configured for the newly-cloned category.
Example
[bright92->category]% use secategory; selinuxsettings
[bright92->category[secategory]->selinuxsettings]% keyvaluesettings
[bright92->category*[secategory*]->selinuxsettings*->keyvaluesettings*]% set domain_can_mmap_files 1
[bright92->category*[secategory*]->selinuxsettings*->keyvaluesettings*]% exit
[bright92->category*[secategory*]->selinuxsettings*]% set mode<tab><tab>
disabled enforcing permissive
[bright92->category*[secategory*]->selinuxsettings*]% set mode permissive #for now, to debug apps
[bright92->category*[secategory*]->selinuxsettings*]% commit
The domain_can_mmap_files boolean setting is needed to allow SELinux policies to revalidate some
kinds of file access in memory.
Creating a new image and using setfiles to set up SELinux file contexts on the new image: One
good way to have a node come up with SELinux file contexts, is to set up the image that is provisioned
so that the image has the contexts already.
This can be configured by first cloning the image, with:
Example
[bright92->category[secategory]->selinuxsettings]% softwareimage
[bright92->softwareimage]% list
Name (key) Path Kernel version Nodes
-------------------- ----------------------------- ----------------------------- --------
default-image /cm/images/default-image 5.14.0-284.11.1.el9_2.x86_64 5
[bright92->softwareimage]% clone default-image selinux-image; commit
...
...[notice] bright92: Initial ramdisk for image selinux-image was generated successfully
Then, after selinux-image has been generated, the contexts can be set up in the new image with the
SELinux setfiles command, using the -r option to set the root path:
Example
[bright92->softwareimage]% quit
[root@bright92 ~]# setfiles -r /cm/images/selinux-image \
/etc/selinux/targeted/contexts/files/file_contexts /cm/images/selinux-image/
[root@bright92 ~]# setfiles -r /cm/images/selinux-image \
/etc/selinux/targeted/contexts/files/file_contexts.local /cm/images/selinux-image/
If the image is updated in the future with new packages, or new files, then the setfiles commands
in the preceding example must be run again to set the file contexts.
112 Installing And Configuring SELinux
Organizing the nodes and setting them up with the newly-created SELinux image: Nodes in the
category can be listed with:
[bright92->category[secategory]]% listnodes
...lists the nodes in that category...
Nodes can be placed in the category from device mode. For example, node001, node002, and
node003 can be configured with:
[bright92->category[secategory]]% device
[bright92->device]% foreach -n node001..node003 (set category secategory)
If the nodes in the category secategory are to run file systems with SELinux file contexts, then the
image generated for this earlier on, selinux-image, can be committed to that category with:
Example
Example
system_u:object_r:admin_home_t:s0 original-ks.cfg
system_u:object_r:admin_home_t:s0 rpmbuild
A
Other Licenses, Subscriptions,
Or Support Vendors
NVIDIA Bright Cluster Manager comes with enough software to allow it to work with no additional
commercial requirements other than its own. However, the cluster manager integrates with some other
products that that have their own separate commercial requirements. The following table lists commer-
cial software that requires a separate license, subscription, or support vendor, and an associated URL
where more information can be found.
Software URL
Workload managers
PBS Professional http://www.altair.com
MOAB http://www.adaptivecomputing.com
LSF http://www.ibm.com/systems/platformcomputing/products/lsf/
GE http://www.altair.com
Distributions
Suse http://www.suse.com
Red Hat http://www.redhat.com
Compilers
Intel https://software.intel.com/en-us/intel-sdp-home
Miscellaneous
Amazon AWS http://aws.amazon.com
B
Hardware Recommendations
The hardware suggestions in section 3.1 are for a minimal cluster, and are inadequate for larger clusters.
For larger clusters, hardware suggestions and examples are given in this section.
The memory used depends significantly on CMDaemon, which is the main NVIDIA Bright Cluster
Manager service component, and on the number of processes running on the head node or regular node.
The number of processes mostly depends on the number of metrics and health checks that are run.
Hard drive storage mostly depends on the number of metrics and health checks that are managed
by CMDaemon.
A device means any item seen as a device by CMDaemon. A list of devices can be seen by cmsh under
its device node. Examples of devices are: regular nodes, cloud nodes, switches, head nodes, GPU units,
and PDUs.
116 Hardware Recommendations
This assumes less than 100 metrics and health checks are being measured, which is a default for
systems that are just head nodes and regular nodes. Beyond the first 100 metrics and health checks, each
further 100 extra take about 1MB extra per device.
B.2.2 Suggested Head Node Specification For Clusters Beyond 1000 Nodes
For clusters with more than 1000 nodes, a head node is recommended with at least the following speci-
fications:
• 24 cores
• 128 GB RAM
• 512 GB SSD
The extra RAM is useful for caching the filesystem, so scrimping on it makes little sense.
Handy for speedy retrievals is to place the monitoring data files, which are by default located under
/var/spool/cmd/monitoring/, on an SSD.
A dedicated /var or /var/lib/mysql partition for clusters with greater than 2500 nodes is also a
good idea.