Document use of SquashFS image as gtdbtk database.#793
Conversation
I used nf-core tools in docker container. $ nf-core modules update -f gtdbtk/classifywf I was user 'root' in container, so I had to change ownership and permissions of modified files afterward outside the container.
jfy133
left a comment
There was a problem hiding this comment.
Hi @muniheart ! Thanks for persisting with your experimentation on this one!
I've done a quick read through, but I will do a pass to update the text tomorrow (I think it's a bit too unnecessarily technical at points)
But so if I understand correctly in very simple terms:
- No code changes are necessary
- You generate your squashfs image
- You need to make an 'empty' directory (so to say) somewhere on your filesystem (and you give this to
--gtdb_db) - You then pass to the container options the squashfs image to then mount at the location of
--gtdb_db(and the name of the top level directory)
Is that correct?
Hi @jfy133. That's it! The top-level-directory bit can even be dropped if you like. It would be resolved by Thanks for your patience in considering this 'enhancement'. |
|
@muniheart please have a look and check this looks OK - we can always roll back or take bits from your pverious commit if necessary |
|
@nf-core-bot fix linting |
| ```nextflow | ||
| process { | ||
| withName: GTDBTK_CLASSIFYWF { | ||
| containerOptions = "-B /<path>/<to>/<empty_dir>/gtdbtk_r220.squashfs:${params.gtdb_db}:image-src=/<output_from_unsquashfs_ls>" | ||
| } | ||
| } | ||
| ``` |
There was a problem hiding this comment.
It looks good, @jfy133 . I have 2 comments:
- The
/<path>/<to>/<empty_dir>/should be passed inparams.gtdb_db. That is mentioned above. - It may look cleaner as
process {
withName: GTDBTK_CLASSIFYWF {
containerOptions = "-B <path_to_image>:${params.gtdb_db}:image-src=/"
}
}Using <path_to_image> may be closer to the convention used elsewhere in the doc.
There was a problem hiding this comment.
I prefer to use /<path>/<to> where possible as it's a more explicit example (and tells the user what to expect). In the case of <output_from_unsquas> bit I can't predict what this will look like so prefer to make it more generic, if that makes sense?
There was a problem hiding this comment.
There is still a problem with arguments to -B, @jfy133.
It should be,
process {
withName: GTDBTK_CLASSIFYWF {
containerOptions = "-B /<path>/<to>/gtdbtk_r220.squashfs:${params.gtdb_db}:image-src=/"
}
}The value /<path>/<to>/<empty_dir> is passed in params.gtdb_db.
Co-authored-by: muniheart <52059779+muniheart@users.noreply.github.com>
|
How about now @muniheart ? |
Hi @jfy133. I suggested a change above. |
| ```nextflow | ||
| process { | ||
| withName: GTDBTK_CLASSIFYWF { | ||
| containerOptions = "-B /<path>/<to>/<empty_dir>/gtdbtk_r220.squashfs:${params.gtdb_db}:image-src=/<output_from_unsquashfs_ls>" | ||
| } | ||
| } | ||
| ``` |
|
Hopefully that's everything now @muniheart ? |
|
@nf-core-bot fix linting |
|
Ignore the failing tests for now @muniheart we can merge in if you're happy with the instructions :) |
|
LGTM, @jfy133. Thank you. |
|
Hi @muniheart, no comments but just wanted to say thank you for persevering with this, despite the difficulties finding the best way to incorporate it! I think this makes a really nice addition to the documentation, and will definitely be helpful to other users of the pipeline. |
|
I +1 that too, thank you @muniheart ! |
| ### `Added` | ||
|
|
||
| - [#784](https://github.com/nf-core/mag/pull/784) - Added `--bin_min_size` and `--bin_max_size` parameters to filter out bins based on size (requested by @maxibor, @alexhbnr, added by @jfy133, @prototaxites). | ||
| - [#793](https://github.com/nf-core/mag/pull/793) - Document use of a SquashFS image with `--gtdb_db` (by @muniheart). |
There was a problem hiding this comment.
praise:
| - [#793](https://github.com/nf-core/mag/pull/793) - Document use of a SquashFS image with `--gtdb_db` (by @muniheart). | |
| - [#793](https://github.com/nf-core/mag/pull/793) - Document use of a SquashFS image with `--gtdb_db`, useful for limited inode infrastructure (by @muniheart). |
Document use of SquashFS image as gtdbtk database.
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).This PR documents a feature of subworkflow
GTDBTK. Users with limited storage resources can economize on inodes (The uncompressed database requires >200K inodes!) by providing the database for GTDB-Tk as a SquashFS image.This feature is only available when using container engines singularity and apptainer.
This is a replacement of #785.
** Why is a PR necessary to use a SquashFS image? Why can't I just mount the image and pass its path?
Mounting a file image requires permission to mount a loop device. If you have sufficient privilege on your system to mount a loop device then you don't need this PR. Singularity and apptainer allow an unprivileged user to bind-mount a file system image to a container's file system. This PR documents the configuration of the bind-mount options.
** Why is a PR necessary to bind-mount an image? Why not just configure the parameters in
containerOptions?Setting
containerOptionsis sufficient. This PR provides usage details.** Why not just pass the database path in input
path(db)?Prior to 788, the database was input to process
GTDBTK_CLASSIFYWFas,This allowed for the configuration
This worked because shell function
nxf_stagein .command.run script would create a directory nameddatabaseand the image would be mounted there. With 788,nxf_stagecreates a symbolic link to the database. The solution is to mount the image file system somewhere outside theworkDirof the process and pass its location inpath(db).** Why not specify a distinguished absolute path at which to mount the image when the container runs and pass that as
path(db)?and call the process
Yes! That does work. See notes on using SqashFS as GTDB-Tk database in
docs/usage.md.