Add PyPOLCA polishing and use polished assemblies for long-read binning by Harshita-sriv · Pull Request #1048 · nf-core/mag

Harshita-sriv · 2026-06-02T07:23:33Z

Summary

This PR adds a PyPOLCA polishing module and integrates polished assemblies into the long-read MAG workflow.

Changes

Added a new PyPOLCA module and conda environment.
Added publication of PyPOLCA outputs.
Added the run_pypolca parameter.
Modified the workflow so that long-read binning uses ch_polished_assemblies instead of raw ch_longread_assemblies.

Validation

Tested on one Nanopore metagenomic sample.

Verified that:

PyPOLCA successfully generated sample_polished.fasta.
LONGREAD_BINNING_PREPARATION:MINIMAP2_ASSEMBLY_INDEX used sample_polished.fasta.
Coverage estimation and downstream binning (MetaBAT2, MaxBin2, CONCOCT, COMEBin, MetaBinner, DAS Tool) executed after PyPOLCA completion.

This ensures MAG generation is performed using polished assemblies rather than raw Flye assemblies.

dialvarezs · 2026-06-02T07:36:22Z

Hi @Harshita-sriv, thanks for this!
For the meantime, can you join the nf-core org? That way you will able to run the ci without explicit approval.
https://nf-co.re/join

dialvarezs

In general I think it looks right, but needs to migrate to the nf-core module, and of course, pass the ci.

Harshita-sriv · 2026-06-02T08:55:53Z

Thanks for the review. I investigated the CI failures and found that the current implementation does not define ch_polished_assemblies when run_pypolca is not enabled, which causes the test workflows to fail. I'll address this together with the requested migration to the official nf-core PyPOLCA module, removal of debug statements, and formatting fixes

Merge remote-tracking branch 'upstream/dev' into add-polishing-module

dialvarezs · 2026-06-04T11:33:30Z

Hi @Harshita-sriv! I think this looks fine, but it would be ideal to wire it up in some test. I'll think what would be the best option here and I'll come back to you.
For the meantime, can you update the changelog?
Thanks!

d4straub

Thanks for the addition, I think thats absolutely desired!
Looks good but I have some concerns with multiple samples.

Harshita-sriv · 2026-06-10T09:13:21Z

I noticed that the remaining lint failure is due to modifications in modules/nf-core/pypolca/run, which no longer matches the upstream module.

The changes were introduced because PyPOLCA was failing during testing, and with the modifications the pipeline completes successfully on the test dataset.

Would it be acceptable to keep these modifications to the PyPOLCA module, or would you prefer that the implementation uses the upstream module unchanged and handles the required logic elsewhere?

d4straub · 2026-06-11T07:40:07Z

Would it be acceptable to keep these modifications to the PyPOLCA module, or would you prefer that the implementation uses the upstream module unchanged and handles the required logic elsewhere?

Some of the changes are not acceptable: the output block and the versioning must be kept as it was originally. Additionally the line ch_versions = ch_versions.mix(PYPOLCA_RUN.out.versions) in workflows/mag.nf has to be removed.
The incoming data (contigs) should be compressed as in the original code. Even if it would be necessary to allow non-compressed input (which may be ok in same cases), copying the files when already uncompressed seems like an unnecessary burden on IO and file sizes to me (assemblies in nf-core/mag can be huge!).

Generally, small changes on nf-core modules can be done and accepted by the linting via https://nf-co.re/docs/nf-core-tools/cli/modules/patch

Harshita-sriv · 2026-06-12T06:27:20Z

@d4straub I restored the PyPOLCA module to the upstream nf-core implementation and moved the coassembly-specific logic into the workflow layer (workflows/mag.nf). The workflow now groups short reads appropriately when --coassemble_group is enabled before passing them to PyPOLCA.

The remaining CI issue appears to be the check_local_copy lint check for pypolca/run. Locally, the module matches the installed nf-core module definition (nf-core modules info pypolca/run), and the workflow compiles and reaches PYPOLCA_RUN in stub runs. All other CI checks pass.

Could you advise whether there is an additional module sync/patch step expected for this module?

d4straub · 2026-06-12T06:45:52Z

Looks good.
I think the easiest way of satisfying the linting is to re-install the module.

dialvarezs

Hi again @Harshita-sriv! This is looking good!

A few last things:

Regarding the testing setup:
I think the best option is to use the longreadonly_alternatives but changing the samplesheet so it uses the hybrid one that has LR and SR data. Then disable SPAdes, SPAdes hybrid and MEGAHIT at least. That way we can get this tested in that profile.

Also I left a comment regarding the pypolca module requiring gzipped input (that will need updating the nf-core module).

dialvarezs · 2026-06-15T05:01:30Z

+    def read_files    = reads instanceof List ? reads : [reads]
+    def read_file_arg = read_files.size() > 1 ? "-1 ${read_files[0]} -2 ${read_files[1]}" : "-1 ${read_files[0]}"
+    """
+    gzip -cdf $contigs > contigs_uncompressed


Can you update the module so it only decompresses when the input has a .gz extension? I was testing this locally, and pypolca received the uncompressed assembly, throwing an error.
And also I think it's a better design not assuming that the input is compressed.

Yes, I also faced this problem, but I didn't change the module because I assumed it wouldn't be accepted. However, for the testing setup, could you confirm that for test_longreadonly_alternatives, I should replace samplesheet.long_read.v4.csv with samplesheet.hybrid.v4.csv, enable run_pypolca, and keep only Flye active by disabling SPAdes, SPAdes-hybrid, MEGAHIT, and MetaMDBG?

It's a totally reasonable change if you execute gzip based on the extension, so it works with compressed and uncompressed input. So don't worry about opening a PR on the module. I can review it if you ping me.

And yes, that's the setup I'm proposing. MetaMDBG is already disabled in that profile, so you'll need to disable both SPAdes and MEGAHIT.

I've implemented the requested changes:

Updated test_longreadonly_alternatives.config to use the hybrid samplesheet and enable run_pypolca while disabling SPAdes/SPAdesHybrid/MEGAHIT.

Updated the Pypolca module to support both gzipped and uncompressed assembly inputs.

The remaining issue is CI validation. The PR currently shows failures related to:

The Pypolca module modification.

The updated test profile / snapshot expectations.

Locally I'm unable to fully validate the nf-test because the test environment is failing before completion (missing tools such as fastp, NanoPlot, porechop), so any snapshot updates generated locally appear incomplete.

Could you please take a look and let me know how you'd like the test/snapshot updates handled, or whether the current implementation looks correct and can be reviewed from your side?

Oh, sorry for the lack of clarification.
As Daniel said, updating the nf-core module directly in the mag repo is not the way.
To update a module you need open a PR on the modules repo: https://github.com/nf-core/modules. When your PR is accepted and merged, you can run nf-core modules update pypolca/run to update the module here.

Okay, will do that. Can you also suggest about the errors due to test file modifications.

You can ignore the errors for now. When you do the update via the official nf-core modules, the error will go away.

Okay, thank you for help!

Add PyPOLCA polishing and use polished assemblies for long-read binning

2f7b365

Harshita-sriv requested review from d4straub, dialvarezs, jfy133, muabnezor and prototaxites as code owners June 2, 2026 07:23

dialvarezs requested changes Jun 2, 2026

View reviewed changes

Comment thread modules/local/pypolca/main.nf Outdated

Comment thread workflows/mag.nf Outdated

Comment thread workflows/mag.nf

harshita.s and others added 6 commits June 2, 2026 17:44

Use official nf-core PyPOLCA module

d5daf4b

t push origin add-polishing-module

05fc21f

Merge remote-tracking branch 'upstream/dev' into add-polishing-module

Fix unresolved merge conflict in modules.json

1c14fa0

Remove obsolete local PyPOLCA module and update snapshots

2e28a41

Fix snapshot ordering for Flye assembly_info output

a40cce8

Apply pre-commit formatting

93e58c0

Harshita-sriv requested a review from dialvarezs June 3, 2026 09:58

d4straub reviewed Jun 5, 2026

View reviewed changes

Comment thread nextflow_schema.json Outdated

Comment thread workflows/mag.nf Outdated

Comment thread conf/modules.config Outdated

harshita.s and others added 3 commits June 10, 2026 12:39

Update PyPOLCA integration

e022822

Aply pre-commit formatting

c3a6998

Remove unused paramsHelp import

6579f8d

Fix PyPOLCA integration for coassembly support

ec34498

Sync pypolca module with nf-core/modules

df42101

dialvarezs reviewed Jun 15, 2026

View reviewed changes

Harshita-sriv added 2 commits June 15, 2026 10:27

Add PyPolca support to longread alternatives test

d62c043

chore: update pypolca module

6a5b25c

Conversation

Harshita-sriv commented Jun 2, 2026

Summary

Changes

Validation

Uh oh!

dialvarezs commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dialvarezs left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Harshita-sriv commented Jun 2, 2026

Uh oh!

dialvarezs commented Jun 4, 2026

Uh oh!

d4straub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Harshita-sriv commented Jun 10, 2026

Uh oh!

d4straub commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Harshita-sriv commented Jun 12, 2026

Uh oh!

d4straub commented Jun 12, 2026

Uh oh!

dialvarezs left a comment

Choose a reason for hiding this comment

Uh oh!

dialvarezs Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Harshita-sriv Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dialvarezs Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Harshita-sriv Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

dialvarezs Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Harshita-sriv Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

dialvarezs Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Harshita-sriv Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dialvarezs commented Jun 2, 2026 •

edited

Loading

dialvarezs left a comment •

edited

Loading

d4straub commented Jun 11, 2026 •

edited

Loading

Harshita-sriv Jun 15, 2026 •

edited

Loading

dialvarezs Jun 15, 2026 •

edited

Loading