Skip to content

Repeat-group tool parameters can serialize wrong and silently produce incorrect results (Datamash) #285

@dannon

Description

@dannon

When running a Galaxy tool with repeat-group parameters, the inputs can serialize in a form Galaxy accepts but mis-applies, so the job finishes ok with a plausible-but-wrong result.

Repro (Datamash, group-by count + mean on a tiny TSV):

Input:

sample	group	value
S1	A	1.0
S2	A	3.0
S3	B	10.0
S4	B	14.0

Expected (count, mean per group):

A	2	2
B	2	12

The submitted job used underscore-style repeat keys (e.g. operations_1_op_name). Galaxy accepted it but produced:

A	2	2
B	2	2     <-- mean wrong

Submitting the same tool with flat pipe-style repeat keys (operations_1|op_name, operations_1|op_column) returned the correct output.

Why it matters: this is a silent data-correctness failure -- the job completes successfully and the wrong number looks reasonable. Repeat-group parameters need to serialize with Galaxy's flat | convention (or be validated) before submission. Needs triage on whether the fix belongs in how Orbit/brain builds run_tool inputs or in the input-template/serialization layer.

Reported via Orbit 0.4.1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions