-
-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
I was trying to setup a small test project to use batchtools
on slurm. I am having an issue that the parent job exits from slurm before all the child jobs are completed. How can I solve this issue?
The main Rscript that submits jobs and the associated configuration files are as:
run_batchtools_job.R
library(batchtools)
reg <- makeRegistry(file.dir = "slurm_registry", seed = 5081, conf.file = "Scripts/batch_tools_test/.batchtools.conf.R")
my_fun <- function(x) {
Sys.sleep(x)
return(x^2)
}
ids <- batchMap(fun = my_fun, x = 100:150, reg = reg)
done <- submitJobs(ids = ids, reg = reg, resources = list(partition = "small", walltime = 86400, memory = 1024, ntasks = 1))
waitForJobs(ids = ids, reg = reg)
getStatus(ids = ids, reg = reg)
final_res <- reduceResultsList(ids = ids, reg = reg)
print(class(final_res))
.batchtools.conf.R
cluster.functions <- makeClusterFunctionsSlurm(template = "Scripts/batch_tools_test/slurm_config.tmpl",
array.jobs = TRUE,
scheduler.latency = 60,
fs.latency = 30)
max.concurrent.jobs <- 5
slurm_config.tmpl
#!/bin/bash
#SBATCH --job-name=<%= job.name %>
#SBATCH --output=<%= log.file %>
#SBATCH --ntasks=<%= resources$ntasks %>
#SBATCH --mem=<%= resources$memory %>MB
#SBATCH --partition=<%= resources$partition %>
module load r/4.3.3
Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
I submit the run_batchtools_job.R
script to slurm using the following sbatch
script.
run_batchtools.sh
#!/bin/bash
#SBATCH --job-name=batchtools_test
#SBATCH --output=batchtools_test.log
#SBATCH --ntasks=1
#SBATCH --time=01:00:00
#SBATCH --mem=2G
#SBATCH --partition=small
# Load R
module load r/4.3.3
# Run your R script
Rscript Scripts/batch_tools_test/run_batchtools_job.R
I observed that the batchtools_test
job exits before all the child jobs spawned using submitJobs
end. As a result, there is nothing in final_res
.
While checking getErrorMessages
, I saw that several jobs are listed as 'not terminated'. But when I manually checked the logs and the results within the registry directories, everything completed as expected.
How can I overcome this issue?
Metadata
Metadata
Assignees
Labels
No labels