Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add create_using parameter for random graphs. #5672

Merged
merged 21 commits into from
Oct 7, 2024

Conversation

tillahoffmann
Copy link
Contributor

This PR adds the create_using parameter to random graph and duplication divergence models, allowing users to create graphs of the desired type.

@dschult
Copy link
Member

dschult commented May 28, 2022

For random graph generation, there is usually a distribution that is expected. For example, the degree distribution should follow a specified distribution. That is, for example, specific to the case of an undirected graph. Other generators work with a directed graph. And you have worked with the code that treats directed and undirected differently, so you know the differences there. MultiGraphs bring on their own set of issues and it is not always clear how the multiedges would affect the distributions.

Can you say something about the motivation of this PR? I am guessing that you want to use create_using to construct your graph from a subclass of the base classes. We believe it should be easy to use the existing code with the new graph subclass as something like: G = new_graph(nx.random_graph_generator(...))

We have been designing a framework in NXEP3 where the graph generators would return either edges or graphs. Then you would have the basic building blocks to combine generator algorithms (which usually just construct edges -- though sometimes with potentially isolated nodes as well) very flexibly without even having a graph data structure. When you are done manipulating edges, you send those into your new_graph class to store it.

Could you look at these options with a perspective that we previously decided not to include the create_using argument? How much does this help you, and is it worth the extra complexity of the API. I don't actually recall the duplication_graph code, so will have to look at that. But the random_graphs.py module definitely had reasons to not include create_using and we'd need to revisit those reasons to consider this PR.

Since you are motivated and interested in this issue, we'd like to hear your thoughts on NXEP3 and generator API.
Thanks!

@tillahoffmann
Copy link
Contributor Author

tillahoffmann commented May 28, 2022

Thanks for the detailed information, @dschult. The motivation underlying this PR is that I have a cython implementation of a graph structure that is compatible with networkx by duck-typing. I use said implementation for generating a large number of graphs for likelihood-free inference ($10^6$ realizations or more). The networkx random graph implementations are useful for prototyping, but I would like to avoid the additional copy due to MyCythonGraph(nx.some_generator(...)). Adding the create_using keyword would instead allow me to use nx.some_generator(..., create_using=MyCythonGraph). E.g., for the Watts-Strogatz graph, that gives me a 3x speed up without any further changes.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I too am +1 on adding create_using to support external graph objects. In order to do so I agree with @dschult 's take that we make sure to validate that the built-in directed/multigraphs are explicitly disallowed for all functions that don't already support them (see a couple instances in inline comments).

Another minor suggestion - let's go ahead and make the newly-added create_using kwargs keyword-only (for signatures that don't already have keyword-only args).

networkx/generators/duplication.py Outdated Show resolved Hide resolved
networkx/generators/duplication.py Show resolved Hide resolved
networkx/generators/duplication.py Outdated Show resolved Hide resolved
networkx/generators/duplication.py Outdated Show resolved Hide resolved
networkx/generators/tests/test_create_using.py Outdated Show resolved Hide resolved
@tillahoffmann tillahoffmann force-pushed the create_using branch 2 times, most recently from 4eb443e to 6cf8a5a Compare September 7, 2024 15:40
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good fix to get this working. I think we should move the utility function to networkx/utils/misc.py. I've got some suggested language for the create_using doc_string description. I would prefer to change the name assert_directedness_multi to something like: check_create_using. thoughts?

I think in the long run, we should make a decorator to handle the create_using keyword processing, and I think that could be a good place to add stuff to handle some Backends machinery, and maybe include some machinery from not_implemented_from. But let's keep this PR more simple so we get some experience and feedback as we make these changes.

Thanks very much!!

As for which functions in random_graph.py to implement this for, I think the ones you have changed for this PR can be updated for this approach. If any don't fit or lead to further questions, let us know. If something else doesn't make sense, let us know that too. But I think implementing this on the functions you have changed here already is fine.

networkx/generators/random_graphs.py Outdated Show resolved Hide resolved
networkx/generators/random_graphs.py Outdated Show resolved Hide resolved
Comment on lines 1201 to 1204
raise TypeError(
"random_shell_graph does not support graph instances for `create_using`. "
"Please use a graph type."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!!

This random_shell_graph uses create_using in two ways -- one to make the new graph and another to make helper graphs which then are combined with the first using "union". I'm hoping we can make this work by only using create_using to create the first graph. Then we avoid this special case where instances are not allowed in create_using.

Is it true that all the gnm_random_graphs could be nx.Graph type without affecting the result? I believe the union operation does what we want here, but if not, we could replace that with G.add_nodes_from(g.nodes) and G.add_edges_from(g.edges). I believe that the code adding edges between shells should stay the same.

If we can use nx.Graph in gnm_random_graph then let's take out the create_using argument in line 1214 and remove this check that create_using is a type or is None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option might be to use create_using=G.__class__ for the construction of the helper graphs so they have the same type as but are distinct instances from G. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good approach. If there are any second-hand effects from add_edge/node in the subclass, handling it this way should take care of that. Most of the time it won't matter but explicitly using G.__class__ seems like it should get most of the troublesome cases. :)

dschult
dschult previously approved these changes Sep 11, 2024
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! :)

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, and thanks for adding tests for the intended use-case, i.e. supporting user-derived subclasses.

I noticed there were no tests of the validation verifying that check_create_using was actually hit on the code paths as intended. I took the liberty of pushing those tests up. For the most part, everything went fine, though there are issues with barabasi_albert, dual_barabasi_albert and random_powerlaw_tree where the validation is not hit. For example, in the barabasi_albert implementation, star_graph(n, create_using) gets called before check_create_using, and star_graph itself doesn't support directed graphs, so a different exception is raised.

We should fix up the issues identified by the new tests. Also, related to the above - I wonder if we shouldn't change the exception type from ValueError to NetworkXError. I understand the desire to use a Python built-in exception, but the majority of exceptions that deal with the "wrong" graph type raise NetworkXError. WDYT?

@tillahoffmann
Copy link
Contributor Author

Great, thank you, @rossbar. I've updated the exceptions to be nx.NetworkXErrors. That also fixes the tests because it no longer matters whether one of the "child" generators validates the directedness of the graph before the "parent" generator can.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very helpful! Thanks @tillahoffmann !
And the tests @rossbar pushed up cover more cases.

My comments/suggestions here are coming from a perspective of "how might this work repo-wide?" I think this PR should stay focused on the random_graphs.py module. But I'd like it to be a POC for wider use.

  1. It'd be nice to use check_create_using even before we construct the graph instance. Ideally in the first line of the function, we could process the create_using argument. One trouble is that we don't know if create_using is a class or an instance. So is_directed might require an unused argument or not.

We should be able to get around that issue with some hackery. And I have suggestd some hackery here. :)

  1. The other processing that is commonly done here is to set the default value of the graph constructor. Sometimes we want to set a default type when create_using is None (usually based on an input named directed). Perhaps one way to design check_create_using would be to take an additional input default=nx.Graph and return the class/instance.

I've suggested a revamp of the check_create_using function to:

  • return the create_using argument updated with a default value if needed.
  • accept a default kwarg that gets returned if the input is None.
  • use hackery to check whether the create_using input is directed/multigraph for either a class or an instance.

The result is that we could call the check in the first line of the function, before creating any graphs. The return value can then be used in e.g. nx.empty_graph. I've made suggestions for this usage in one function that doesn't have a default and in another where it needs a default dependent on directed.

networkx/generators/random_graphs.py Outdated Show resolved Hide resolved
networkx/generators/random_graphs.py Outdated Show resolved Hide resolved
networkx/utils/misc.py Outdated Show resolved Hide resolved
networkx/utils/misc.py Outdated Show resolved Hide resolved
networkx/utils/misc.py Outdated Show resolved Hide resolved
@dschult dschult dismissed their stale review September 29, 2024 17:10

More ideas for making this work repo-wide

@dschult
Copy link
Member

dschult commented Sep 29, 2024

I can make the recently suggested changes above if you don't have time atm. Didn't mean to dump a bunch of stuff on you. What do you think?

@rossbar
Copy link
Contributor

rossbar commented Sep 30, 2024

Thanks @dschult I'm personally +1 on the idea of modifying check_create_using in the proposed way. I very much like the pattern of being able to do the property check (i.e. directed, multigraph) before creating any graphs as opposed to the create-then-validate workflow. This prevents some of the more subtle code paths that were caught in the last round of testing.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a commit to change the _check_create_using function to include a default parameter and output the create_using object. And I moved the checks to the top of the functions.

I'll fix any failing tests, but hopefully there won't be any and this is ready for merging AFAICT.

@dschult dschult requested a review from rossbar October 5, 2024 12:35
Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tillahoffmann and @dschult !

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I didn't make the changes to the duplication.py functions... oops...
I'll get that added soon.

@dschult dschult merged commit 75f2af5 into networkx:main Oct 7, 2024
50 checks passed
@jarrodmillman jarrodmillman added this to the 3.4 milestone Oct 7, 2024
@dschult
Copy link
Member

dschult commented Oct 7, 2024

Thanks @tillahoffmann (and @rossbar reviews)!
create_using in random_graphs.py and duplication.py is merged!

@tillahoffmann tillahoffmann deleted the create_using branch October 7, 2024 19:15
@tillahoffmann
Copy link
Contributor Author

Thank you for finishing this one, @dschult!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants