-
Notifications
You must be signed in to change notification settings - Fork 40
Description
I have custom CSV format data I have imported to hatchet, and I would like a way to aggregate over subtrees with different top level names. In particular, I have subtrees for different Runge-Kutta stages, eRK_stage_[1-4] and I want to produce a new tree that has the sum of each subtree. The subtrees have identical timer/node names but different timer/metric values. So if eRK_stage_[1-4] have sub-node add_nonlinear, then the new aggregate subtree, let's call it eRK_all, would have a subnode add_nonlinear too, with value that is the sum of all the values for stages 1-4.
I can do this as a pre-processing step in my custom reader implementation, but this sort of tree operation seems like it could be useful for other people as well. It's not at all obvious to me how to do this with existing APIs, although perhaps a clever groupby_aggregate or filter followed by squash can do something like this?
I would also love a convenience function for creating a sub-graphframe. For, I have initialization and timeloop regions, and most of the time I am only interested in the timeloop. I'd like to be able to say something like gf.tree(root='GENE.gsub.timeloop.t_loop'), and it would only print from Node t_loop with parents timeloop, gsub, GENE. Or gf.subtree('GENE.gsub.timeloop.t_loop').tree() would work nicely too.
If this is not possible with existing APIs, and there is interest, I could work on a PR once I get it working. I guess one complication is that these operations may not work well with non-tree graphs.