Skip to content

Conversation

muniheart
Copy link
Contributor

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/taxprofiler branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Update the module krakenuniq/preloadedkrakenuniq. This will allow the specification of ram_chunk_size per database. 8397, 8483

@jfy133
Copy link
Member

jfy133 commented May 16, 2025

Pushed a change to hopefully fix the error...

@jfy133
Copy link
Member

jfy133 commented May 17, 2025

@nf-core-bot fix linting

@muniheart
Copy link
Contributor Author

The value, --krakenuniq_ram_chunk_size 16G, is too large for requested memory, 15G. I have reduced to 8G.

@muniheart
Copy link
Contributor Author

I don't understand why task Run pipeline with test data (latest-everything | test_krakenuniq | docker) is reported as ✅ while log indicates error and non-zero exit status.

ERROR ~ Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
The following invalid input values have been detected:

* --databases (https://raw.githubusercontent.com/nf-core/test-datasets/taxprofiler/database_krakenuniq.csv): Validation of file failed:
	-> Entry 1: Error for field 'db_path' (https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/krakenuniq/testdb-krakenuniq.tar.gz): the file or directory 'https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/krakenuniq/testdb-krakenuniq.tar.gz' does not exist (db_path should be either a file path or a directory.)

 -- Check script 'subworkflows/nf-core/utils_nfschema_plugin/main.nf' at line: 39 or see '.nextflow.log' file for more details
Error: Process completed with exit code 1.

I have run the workflow locally to successful completion with,

nextflow run muniheart/taxprofiler-dev -r dev -profile test_krakenuniq,singularity --krakenuniq_ram_chunk_size 5GB --outdir results/2025-05-18_12h48

When run without option --krakenuniq_ram_chunk_size 5GB, however, the script, .command.sh, contained option --preload-size 16G, and each slurm job was OOM killed. Nonetheless, each task reported exitcode 0. The configuration,

process {
withName: KRAKENUNIQ_PRELOADEDKRAKENUNIQ {
ext.args = '--preload-size 2G'
}

is not having its intended effect.

@jfy133
Copy link
Member

jfy133 commented May 19, 2025

I don't understand why task Run pipeline with test data (latest-everything | test_krakenuniq | docker) is reported as ✅ while log indicates error and non-zero exit status.

ERROR ~ Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
The following invalid input values have been detected:

* --databases (https://raw.githubusercontent.com/nf-core/test-datasets/taxprofiler/database_krakenuniq.csv): Validation of file failed:
	-> Entry 1: Error for field 'db_path' (https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/krakenuniq/testdb-krakenuniq.tar.gz): the file or directory 'https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/krakenuniq/testdb-krakenuniq.tar.gz' does not exist (db_path should be either a file path or a directory.)

 -- Check script 'subworkflows/nf-core/utils_nfschema_plugin/main.nf' at line: 39 or see '.nextflow.log' file for more details
Error: Process completed with exit code 1.

I have run the workflow locally to successful completion with,

nextflow run muniheart/taxprofiler-dev -r dev -profile test_krakenuniq,singularity --krakenuniq_ram_chunk_size 5GB --outdir results/2025-05-18_12h48

When run without option --krakenuniq_ram_chunk_size 5GB, however, the script, .command.sh, contained option --preload-size 16G, and each slurm job was OOM killed. Nonetheless, each task reported exitcode 0. The configuration,

process {
withName: KRAKENUNIQ_PRELOADEDKRAKENUNIQ {
ext.args = '--preload-size 2G'
}

is not having its intended effect.

Hrm that's odd. I would need to experiment on Thursday

@muniheart muniheart added bug Something isn't working enhancement Improvement for existing functionality labels May 20, 2025
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made a slight tweak of the default behaviour to no chunking, and I confirmed that after moving the 2G specification in test_krakenuniq to a parameter call not ext.args, then I got the correct order of behaviour:

  1. database ram size
  2. --<param> ram size
  3. Config ram size
  4. No ram size, just preload full

If You're OK with my minor changes in the last couple of commits, I think we are good to go for a merge. Please confirm if you're happy and I can hit the button if you don't have merge permissions.

@jfy133
Copy link
Member

jfy133 commented May 22, 2025

@nf-core-bot fix linting

@muniheart
Copy link
Contributor Author

I just made a slight tweak of the default behaviour to no chunking, and I confirmed that after moving the 2G specification in test_krakenuniq to a parameter call not ext.args, then I got the correct order of behaviour:

1. database ram size

2. `--<param>` ram size

3. Config ram size

4. No ram size, just preload full

If You're OK with my minor changes in the last couple of commits, I think we are good to go for a merge. Please confirm if you're happy and I can hit the button if you don't have merge permissions.

I'm okay with your changes, @jfy133, though they don't address the underlying problem, which is that template update included conf/modules.conf into nextflow.config a 2nd time, preventing the overriding of --preload-size value through a custom configuration file. Commit fixed the tests, but I'm concerned it will be undone by a future update of the template.

@muniheart
Copy link
Contributor Author

muniheart commented May 22, 2025

The memory requested in test_krakenuniq.config, 15.GB is insufficient for a chunk size of 16GB and may not be sufficient for preload step. That will need to be adjusted to accommodate configured database. Alternatively, the test databases file can be modified to include --preload-size with a value compatible with memory request.

The test database is 30MB compressed so preload should work.

@muniheart
Copy link
Contributor Author

muniheart commented May 22, 2025

Sorry, I had missed this

    krakenuniq_ram_chunk_size              = '2G'

It makes sense now. I am still concerned by the duplicate includeConfig mentioned above.

@jfy133
Copy link
Member

jfy133 commented May 23, 2025

I just made a slight tweak of the default behaviour to no chunking, and I confirmed that after moving the 2G specification in test_krakenuniq to a parameter call not ext.args, then I got the correct order of behaviour:

1. database ram size

2. `--<param>` ram size

3. Config ram size

4. No ram size, just preload full

If You're OK with my minor changes in the last couple of commits, I think we are good to go for a merge. Please confirm if you're happy and I can hit the button if you don't have merge permissions.

I'm okay with your changes, @jfy133, though they don't address the underlying problem, which is that template update included conf/modules.conf into nextflow.config a 2nd time, preventing the overriding of --preload-size value through a custom configuration file. Commit fixed the tests, but I'm concerned it will be undone by a future update of the template.

Ah I see, sorry I misunderstood.

The duplicated includeConfig was indeed a mistake during a template merge. The module loading should be at the end of nextflow.config: https://github.com/nf-core/tools/blob/e6497bcdf8660b7029bd739f890c2d73e87dd867/nf_core/pipeline-template/nextflow.config#L336-L338

That said, to my knowledge the --<params>, should take the highest priority over everything. Was that not what you observed in testing?

Sorry, I had missed this

    krakenuniq_ram_chunk_size              = '2G'

It makes sense now. I am still concerned by the duplicate includeConfig mentioned above.

Yes exactly, sorry I wasn't clear in my description. I moved it to ensure it does follow the right precedence so we can override it.

That said I think with the exception of test configs, I think whatever is supplied via a custom config should take highest priority as presumably that would be what the user wants to use if our normal recommendations of passing the value do not work for them.

I unfortunately can't look at this today, if you have time could you move the includeConfig modules.config back to the correct place end of the template (again, sorry for that duplication error, indeed our mistake!) , and test again?

@muniheart
Copy link
Contributor Author

I've moved,

includeConfig 'conf/modules.config'

to the end of nextflow.config. Tests corroborate your ordering of assignment, @jfy133.

@jfy133
Copy link
Member

jfy133 commented May 26, 2025

Thank you @muniheart ! I hopefully have just fixed the linting issue.

I want to do one more batch of tests myself to reassure myself (config ordering still makes me nervous), but otherwise I think this is OK.

I'm currently back home in the UK for family reasons, and working irregular hours. I will try to get to these tests when I can.

@muniheart
Copy link
Contributor Author

I would be suspicious of any definition of ext.args in file conf/test_krakenuniq.config that is redefined in conf/modules.config with a different value. That's the problem I ran into here, placing,

    withName: KRAKENUNIQ_PRELOADEDKRAKENUNIQ {
        ext.args = '--preload-size 2G'
    }

in conf/test_krakenuniq.config, then redefining it in the included file. You may need to use your trick with params a few more times 😄 Alternatively, and more interesting to me, use the ( possibly ) existing value of ext.args in its redefinition through judicious use of closures.

Thanks for your help, @jfy133.

@jfy133
Copy link
Member

jfy133 commented May 27, 2025

Testing (will update over time:

  • ✅ Expect load whole database (--preload) nextflow run ../main.nf -profile test_krakenuniq,docker --outdir ./results_nochunk --krakenuniq_ram_chunk_size false
  • ✅ Expect ram chunk size of 4GB via parameter (--preload-ram-size) nextflow run ../main.nf -profile test_krakenuniq,docker --outdir ./results_nochunk --krakenuniq_ram_chunk_size '4G'
  • ✅ Expect ram chunk size of 3Gb via database sheet specified (--preload-ram-size) nextflow run ../main.nf -profile test_krakenuniq,docker --outdir ./results_nochunk --krakenuniq_ram_ chunk_size false --databases database_krakenuniq.csv 1
  • ✅ Expect ram chunk size of 3Gb via database sheet specified (--preload-ram-size) even though RAM chunk size is 4G nextflow run ../main.nf -profile test_krakenuniq,docker --outdir ./results_nochunk --krakenuniq_ram_chunk_size false --databases database_krakenuniq.csv --krakenuniq_ram_chunk_size '4G'
    • Command results in --preload-size 4G --preload-size 3G \

So I think this works!

Random extra test:

  • ❌ Specify with all three methods (parameter, database, custom config for parameter) nextflow run ../main.nf -profile test_krakenuniq,docker --outdir ./results_nochunk --krakenuniq_ram_chunk_size false --databases database_krakenuniq.csv --krakenuniq_ram_chunk_size '4G' -c custom_config.conf 2
    • This shows that indeed custom configs will override any oter setting, this is something we should document (I will push directly to your PR, but otherwise this is GTG) - also this does not influence the amount nextflow ACTUALLY requests (this comes from the memory = setting), just how much it uses

}```

Footnotes

  1. preparation: wget https://github.com/nf-core/test-datasets/raw/refs/heads/taxprofiler/database_krakenuniq.csv and sed -i 's/,,/,--preload-size 3G,/g' database_krakenuniq.csv

  2. withName: KRAKENUNIQ_PRELOADEDKRAKENUNIQ {
        ext.args = '--preload-size 2G'
    }
    

@jfy133
Copy link
Member

jfy133 commented May 27, 2025

Made a couple of additional documentation additions/warnings, but otherwise you can merge once all tests pass @muniheart !

@jfy133
Copy link
Member

jfy133 commented May 28, 2025

I'll merge on your behalf now I got tests passing :)

@jfy133 jfy133 merged commit 42c5317 into nf-core:dev May 28, 2025
25 checks passed
@jfy133
Copy link
Member

jfy133 commented May 28, 2025

Thank you again for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement Improvement for existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants