-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add create_using
parameter for random graphs.
#5672
Conversation
For random graph generation, there is usually a distribution that is expected. For example, the degree distribution should follow a specified distribution. That is, for example, specific to the case of an undirected graph. Other generators work with a directed graph. And you have worked with the code that treats directed and undirected differently, so you know the differences there. MultiGraphs bring on their own set of issues and it is not always clear how the multiedges would affect the distributions. Can you say something about the motivation of this PR? I am guessing that you want to use We have been designing a framework in NXEP3 where the graph generators would return either edges or graphs. Then you would have the basic building blocks to combine generator algorithms (which usually just construct edges -- though sometimes with potentially isolated nodes as well) very flexibly without even having a graph data structure. When you are done manipulating edges, you send those into your Could you look at these options with a perspective that we previously decided not to include the Since you are motivated and interested in this issue, we'd like to hear your thoughts on NXEP3 and generator API. |
Thanks for the detailed information, @dschult. The motivation underlying this PR is that I have a cython implementation of a graph structure that is compatible with |
9c51cfb
to
120a198
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too am +1 on adding create_using
to support external graph objects. In order to do so I agree with @dschult 's take that we make sure to validate that the built-in directed/multigraphs are explicitly disallowed for all functions that don't already support them (see a couple instances in inline comments).
Another minor suggestion - let's go ahead and make the newly-added create_using
kwargs keyword-only (for signatures that don't already have keyword-only args).
4eb443e
to
6cf8a5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good fix to get this working. I think we should move the utility function to networkx/utils/misc.py
. I've got some suggested language for the create_using
doc_string description. I would prefer to change the name assert_directedness_multi
to something like: check_create_using
. thoughts?
I think in the long run, we should make a decorator to handle the create_using keyword processing, and I think that could be a good place to add stuff to handle some Backends machinery, and maybe include some machinery from not_implemented_from
. But let's keep this PR more simple so we get some experience and feedback as we make these changes.
Thanks very much!!
As for which functions in random_graph.py
to implement this for, I think the ones you have changed for this PR can be updated for this approach. If any don't fit or lead to further questions, let us know. If something else doesn't make sense, let us know that too. But I think implementing this on the functions you have changed here already is fine.
networkx/generators/random_graphs.py
Outdated
raise TypeError( | ||
"random_shell_graph does not support graph instances for `create_using`. " | ||
"Please use a graph type." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!!
This random_shell_graph
uses create_using in two ways -- one to make the new graph and another to make helper graphs which then are combined with the first using "union". I'm hoping we can make this work by only using create_using
to create the first graph. Then we avoid this special case where instances are not allowed in create_using
.
Is it true that all the gnm_random_graph
s could be nx.Graph
type without affecting the result? I believe the union
operation does what we want here, but if not, we could replace that with G.add_nodes_from(g.nodes)
and G.add_edges_from(g.edges)
. I believe that the code adding edges between shells should stay the same.
If we can use nx.Graph
in gnm_random_graph
then let's take out the create_using
argument in line 1214 and remove this check that create_using
is a type or is None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option might be to use create_using=G.__class__
for the construction of the helper graphs so they have the same type as but are distinct instances from G
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good approach. If there are any second-hand effects from add_edge/node in the subclass, handling it this way should take care of that. Most of the time it won't matter but explicitly using G.__class__
seems like it should get most of the troublesome cases. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good, and thanks for adding tests for the intended use-case, i.e. supporting user-derived subclasses.
I noticed there were no tests of the validation verifying that check_create_using
was actually hit on the code paths as intended. I took the liberty of pushing those tests up. For the most part, everything went fine, though there are issues with barabasi_albert
, dual_barabasi_albert
and random_powerlaw_tree
where the validation is not hit. For example, in the barabasi_albert
implementation, star_graph(n, create_using)
gets called before check_create_using
, and star_graph
itself doesn't support directed graphs, so a different exception is raised.
We should fix up the issues identified by the new tests. Also, related to the above - I wonder if we shouldn't change the exception type from ValueError
to NetworkXError
. I understand the desire to use a Python built-in exception, but the majority of exceptions that deal with the "wrong" graph type raise NetworkXError. WDYT?
Great, thank you, @rossbar. I've updated the exceptions to be |
d9b143f
to
1ec1f5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very helpful! Thanks @tillahoffmann !
And the tests @rossbar pushed up cover more cases.
My comments/suggestions here are coming from a perspective of "how might this work repo-wide?" I think this PR should stay focused on the random_graphs.py
module. But I'd like it to be a POC for wider use.
- It'd be nice to use
check_create_using
even before we construct the graph instance. Ideally in the first line of the function, we could process thecreate_using
argument. One trouble is that we don't know ifcreate_using
is a class or an instance. Sois_directed
might require an unused argument or not.
We should be able to get around that issue with some hackery. And I have suggestd some hackery here. :)
- The other processing that is commonly done here is to set the default value of the graph constructor. Sometimes we want to set a default type when
create_using
is None (usually based on an input nameddirected
). Perhaps one way to designcheck_create_using
would be to take an additional inputdefault=nx.Graph
and return the class/instance.
I've suggested a revamp of the check_create_using function to:
- return the create_using argument updated with a default value if needed.
- accept a
default
kwarg that gets returned if the input isNone
. - use hackery to check whether the
create_using
input is directed/multigraph for either a class or an instance.
The result is that we could call the check in the first line of the function, before creating any graphs. The return value can then be used in e.g. nx.empty_graph
. I've made suggestions for this usage in one function that doesn't have a default and in another where it needs a default dependent on directed
.
More ideas for making this work repo-wide
I can make the recently suggested changes above if you don't have time atm. Didn't mean to dump a bunch of stuff on you. What do you think? |
Thanks @dschult I'm personally +1 on the idea of modifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed a commit to change the _check_create_using
function to include a default
parameter and output the create_using
object. And I moved the checks to the top of the functions.
I'll fix any failing tests, but hopefully there won't be any and this is ready for merging AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tillahoffmann and @dschult !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I didn't make the changes to the duplication.py
functions... oops...
I'll get that added soon.
Thanks @tillahoffmann (and @rossbar reviews)! |
Thank you for finishing this one, @dschult! |
This PR adds the
create_using
parameter to random graph and duplication divergence models, allowing users to create graphs of the desired type.