Skip to content

blegat/LINMA2710

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

250 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LINMA2710 Scientific Computing

Open In Colab

This repository contains different resources for the LINMA2710 courses given at UCLouvain.

Schedule

Week Wednesday Topic Thursday Topic Lecturer
S1 04/02/2026 05/02/2026 C++ Absil
S2 11/02/2026 12/02/2026 C++ Absil
S3 18/02/2026 Parallel 19/02/2026 Parallel Legat
S4 25/02/2026 26/02/2026 Parallel Legat
S5 04/03/2026 Distributed 05/03/2026 Distributed Legat
S6 11/03/2026 12/03/2026 Distributed Legat
S7 18/03/2026 GPU 19/03/2026 GPU Legat
S8 25/03/2026 26/03/2026 GPU Legat
S9 01/04/2026 02/04/2026 PDE Absil
S10 08/04/2026 09/04/2026 PDE Absil
S11 15/04/2026 Q&A project 16/04/2026 PDE Absil
🥚 22/04/2026 🐣 23/04/2026 🐇 🐰
🥚 29/04/2026 🐣 30/04/2026 🐇 🐰
S12 06/05/2026 Oral project 07/05/2026 Power Consumption Legat
S13 13/05/2026 Oral project 14/05/2026 ✝️⬆️☁️

CECI cluster

In order to use the CECI clusters, you need a CECI account. If you don't already have an account (if you don't know whether you have an account, chances are you don't have one), first create one. You will receive an email, follow the link in the email and in the field labelled "Email of Supervising Professor", enter benoit.legat@uclouvain.be. Follow the steps detailed here in order to download your private key, create the corresponding public key and create the file .ssh/config.

You should now be able to connect to the manneback cluster with

(your computer) $ ssh manneback

Tip

For choosing the clusters, check this list and this one with manneback to see which one has the hardware you need. The Lyra cluster was recently added with GPU support so it could also be used for P3 if manneback is overloaded.

In addition to the information below and the CECI documentation here is a little FAQ.

Syncing your files

We mention here 4 ways to sync your files:

  1. Copy file by file with scp
  2. Using git
  3. Mount a whole folder with sshfs
  4. Use an extension of your IDE

1. scp

Follow this guide to copy files from your computer to the cluster. For instance, with scp you can copy a file submit.sh from your computer with:

(your computer) $ scp submit.sh manneback:.

2. git

It might however be a bit tedious to keep the files in sync with scp. I recommend pushing your project in a private (don't use a public git as your code shouldn't be accessible to other students!) git (for instance in https://forge.uclouvain.be/) and pull it from the CECI cluster. You can then easily update the code on the CECI cluster with git pull.

Warning

Do not sync the binaries of with the CECI cluster as you might have a different architecture. Exclude them from the git by adding them in the .gitignore file and simply recompile them on the cluster.

3. sshfs

You can also modify the files in a folder locally using sshfs. For instance, I have a LINMA2710 folder in my home directory on the manneback cluster. To access these files locally on a new folder manneback, I can do

(local computer)$ mkdir manneback-sshfs
(local computer)$ sshfs manneback:/home/ucl/inma/blegat/LINMA2710 ./manneback-sshfs

You can then open the manneback-sshfs with your favorite IDE on your local computers and you will be modifying files directly on the cluster! If you open a terminal in your IDE, it will still be running on the CPU and GPU of your local computer even though it will see the files on the cluster. Therefore, in order to compile and run your program on the cluster, you still need to ssh to the cluster in that terminal.

4. IDE extension

A popular approach to remote development over ssh is using the Remote - SSH extension of VSCode as detailed here. This will open a VSCode window where you will see the files on the folder of the cluster like with the sshfs approach but the terminal you open in VSCode will also be running on the cluster.

Warning

In the community open-source releases of VSCode such as VSCodium, Open VSX is used instead of the VS Marketplace. As the Remote - SSH extension is available in the marketplace but not in Open VSX, you can Open Remote - SSH instead. Note that the well-known workaround to use the VS Marketplace in VSCodium violates the terms of use of the marketplace which only allows it to be used with the binaries provided by Microsoft.

Submit a job

The command that you run directly after connecting with ssh are run on the login node which has limited resources as it is only meant for you to connect and send jobs via Slurm that are executed on compute nodes, you will also not have any GPU on the login node. So don't just run your program with [blegat@mbackf1 ~] ./a.out (note mbackf1 which means you are on a login node). To run your code, submit a job with Slurm.

Using sbatch

Use this tool to generate a submission script.

Warning

The --partition option is dependent on the the cluster. As manneback is not an option in the tool, use another cluster and then remove the line with --partition or update it with one of the partition listed by sinfo.

Save this script as a file, say submit.sh. You can then use it with

[blegat@mbackf1 ~] sbatch submit.sh
```ion
The output produced by the job is written in the file `slurm-<JOBID>.out` where `<JOBID>` is the job id listed in the `JOBID` column of the table outputted by
```sh
[blegat@mbackf1 ~] squeue --me

Using salloc

You can also use salloc to be able to execute commands interactively in the allocated compute nodes.

[blegat@mbackf1 ~]$ salloc --ntasks=4
salloc: Pending job allocation 56630153
salloc: job 56630153 queued and waiting for resources
salloc: job 56630153 has been allocated resources
salloc: Granted job allocation 56630153
salloc: Waiting for resource configuration
salloc: Nodes mb-sky002 are ready for job
[blegat@mb-sky002 examples]$ ml OpenMPI
[blegat@mb-sky002 examples]$ srun ./a.out
Process 3/4 is running on node <<mb-sky002.cism.ucl.ac.be>>
Process 0/4 is running on node <<mb-sky002.cism.ucl.ac.be>>
Process 1/4 is running on node <<mb-sky002.cism.ucl.ac.be>>
Process 2/4 is running on node <<mb-sky002.cism.ucl.ac.be>>

Note that the output will be displayed directly on the terminal and not to a slurm-<JOBID>.out file. This means that, if you loose the ssh connection (which can easily happen, e.g., if you laptop is suspended), you will loose the ability to interact with the allocated session on the compute nodes (you could also use sattach to reattach it) and also the output of the terminal. One useful trick is to use screen. If your ssh connection is lost, simply reconnect and run screen -r to get your session back. More details here.

Using srun

The command lines that are either executed in the shell opened by salloc or that are inside the submit.sh script executed by sbatch are each using only one process. To allocate several processes for one command, use srun. The srun commands inherits from the options passed to salloc and sbatch so no need to repeat the --ntasks options etc... for srun.

Don't mix it with mpiexec

When using MPI, you would like to run your executable with several processes. For this, you typically use mpiexec when running it on your laptop. Inside a salloc shell or inside a sbatch submit.sh script, either use srun (recommended by Slurm), mpirun (recommended by OpenMPI), or mpiexec which is mostly equivalent to mpirun. See also the CECI doc. Don't use both (e.g., srun mpirun ./a.out) as otherwise srun will run ntasks times mpirun which will run with ntasks processes, which is not what you want.

Julia

Do not use module load CUDA. This command uses Lmod to set LD_LIBRARY_PATH (as detailed in the output of module show CUDA) which is discouraged.

Running Julia interactively on a compute node is as simple as running $ srun --pty julia. If CUDA was precompiled on a node with no GPU (such as the login node), you will see the error

julia> using CUDA
┌ Error: CUDA.jl could not find an appropriate CUDA runtime to use.
│
│ CUDA.jl's JLLs were precompiled without an NVIDIA driver present.
│ This can happen when installing CUDA.jl on an HPC log-in node,
│ or in a container. In that case, you need to specify which CUDA
│ version to use at run time by calling `CUDA.set_runtime_version!`
│ or provisioning the preference it sets at compile time.
│
│ If you are not running in a container or on an HPC log-in node,
│ try re-compiling the CUDA runtime JLL and re-loading CUDA.jl:
│      pkg = Base.PkgId(Base.UUID("76a88914-d11a-5bdc-97e0-2f5a05c973a2"),
│                       "CUDA_Runtime_jll")
│      Base.compilecache(pkg)
│      # re-start Julia and re-load CUDA.jl
│
│ For more details, refer to the CUDA.jl documentation at
│ https://cuda.juliagpu.org/stable/installation/overview/
└ @ CUDA ~/.julia/packages/CUDA/1kIOw/src/initialization.jl:118

Just copy-paste these lines on the REPL to re-compile CUDA.jl and then exit it and restart a new Julia session with srun again. If you still get the error, leave the REPL, then run the following (replacing v1.11 by your Julia version of course):

(manneback cluster) $ rm -r ~/.julia/compiled/v1.11/CUDA*

New, start a new Julia session with srun and using CUDA should not error anymore.

See here for additional information.

About

Course material for the course LINMA2710 at UCLouvain

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors