Skip to content

feature request: subtree aggregate, graphframe subtree #431

@bd4

Description

@bd4

I have custom CSV format data I have imported to hatchet, and I would like a way to aggregate over subtrees with different top level names. In particular, I have subtrees for different Runge-Kutta stages, eRK_stage_[1-4] and I want to produce a new tree that has the sum of each subtree. The subtrees have identical timer/node names but different timer/metric values. So if eRK_stage_[1-4] have sub-node add_nonlinear, then the new aggregate subtree, let's call it eRK_all, would have a subnode add_nonlinear too, with value that is the sum of all the values for stages 1-4.

I can do this as a pre-processing step in my custom reader implementation, but this sort of tree operation seems like it could be useful for other people as well. It's not at all obvious to me how to do this with existing APIs, although perhaps a clever groupby_aggregate or filter followed by squash can do something like this?

I would also love a convenience function for creating a sub-graphframe. For, I have initialization and timeloop regions, and most of the time I am only interested in the timeloop. I'd like to be able to say something like gf.tree(root='GENE.gsub.timeloop.t_loop'), and it would only print from Node t_loop with parents timeloop, gsub, GENE. Or gf.subtree('GENE.gsub.timeloop.t_loop').tree() would work nicely too.

If this is not possible with existing APIs, and there is interest, I could work on a PR once I get it working. I guess one complication is that these operations may not work well with non-tree graphs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions