Add `create_using` parameter for random graphs. #5672

tillahoffmann · 2022-05-28T17:59:44Z

This PR adds the create_using parameter to random graph and duplication divergence models, allowing users to create graphs of the desired type.

dschult · 2022-05-28T22:10:41Z

For random graph generation, there is usually a distribution that is expected. For example, the degree distribution should follow a specified distribution. That is, for example, specific to the case of an undirected graph. Other generators work with a directed graph. And you have worked with the code that treats directed and undirected differently, so you know the differences there. MultiGraphs bring on their own set of issues and it is not always clear how the multiedges would affect the distributions.

Can you say something about the motivation of this PR? I am guessing that you want to use create_using to construct your graph from a subclass of the base classes. We believe it should be easy to use the existing code with the new graph subclass as something like: G = new_graph(nx.random_graph_generator(...))

We have been designing a framework in NXEP3 where the graph generators would return either edges or graphs. Then you would have the basic building blocks to combine generator algorithms (which usually just construct edges -- though sometimes with potentially isolated nodes as well) very flexibly without even having a graph data structure. When you are done manipulating edges, you send those into your new_graph class to store it.

Could you look at these options with a perspective that we previously decided not to include the create_using argument? How much does this help you, and is it worth the extra complexity of the API. I don't actually recall the duplication_graph code, so will have to look at that. But the random_graphs.py module definitely had reasons to not include create_using and we'd need to revisit those reasons to consider this PR.

Since you are motivated and interested in this issue, we'd like to hear your thoughts on NXEP3 and generator API.
Thanks!

tillahoffmann · 2022-05-28T23:18:55Z

Thanks for the detailed information, @dschult. The motivation underlying this PR is that I have a cython implementation of a graph structure that is compatible with networkx by duck-typing. I use said implementation for generating a large number of graphs for likelihood-free inference ($10^6$ realizations or more). The networkx random graph implementations are useful for prototyping, but I would like to avoid the additional copy due to MyCythonGraph(nx.some_generator(...)). Adding the create_using keyword would instead allow me to use nx.some_generator(..., create_using=MyCythonGraph). E.g., for the Watts-Strogatz graph, that gives me a 3x speed up without any further changes.

rossbar

I too am +1 on adding create_using to support external graph objects. In order to do so I agree with @dschult 's take that we make sure to validate that the built-in directed/multigraphs are explicitly disallowed for all functions that don't already support them (see a couple instances in inline comments).

Another minor suggestion - let's go ahead and make the newly-added create_using kwargs keyword-only (for signatures that don't already have keyword-only args).

networkx/generators/duplication.py

networkx/generators/tests/test_create_using.py

dschult

This is a good fix to get this working. I think we should move the utility function to networkx/utils/misc.py. I've got some suggested language for the create_using doc_string description. I would prefer to change the name assert_directedness_multi to something like: check_create_using. thoughts?

I think in the long run, we should make a decorator to handle the create_using keyword processing, and I think that could be a good place to add stuff to handle some Backends machinery, and maybe include some machinery from not_implemented_from. But let's keep this PR more simple so we get some experience and feedback as we make these changes.

Thanks very much!!

As for which functions in random_graph.py to implement this for, I think the ones you have changed for this PR can be updated for this approach. If any don't fit or lead to further questions, let us know. If something else doesn't make sense, let us know that too. But I think implementing this on the functions you have changed here already is fine.

networkx/generators/random_graphs.py

dschult · 2024-09-09T06:42:46Z

networkx/generators/random_graphs.py

+        raise TypeError(
+            "random_shell_graph does not support graph instances for `create_using`. "
+            "Please use a graph type."
+        )


Thanks for this!!

This random_shell_graph uses create_using in two ways -- one to make the new graph and another to make helper graphs which then are combined with the first using "union". I'm hoping we can make this work by only using create_using to create the first graph. Then we avoid this special case where instances are not allowed in create_using.

Is it true that all the gnm_random_graphs could be nx.Graph type without affecting the result? I believe the union operation does what we want here, but if not, we could replace that with G.add_nodes_from(g.nodes) and G.add_edges_from(g.edges). I believe that the code adding edges between shells should stay the same.

If we can use nx.Graph in gnm_random_graph then let's take out the create_using argument in line 1214 and remove this check that create_using is a type or is None.

Another option might be to use create_using=G.__class__ for the construction of the helper graphs so they have the same type as but are distinct instances from G. What do you think?

That sounds like a good approach. If there are any second-hand effects from add_edge/node in the subclass, handling it this way should take care of that. Most of the time it won't matter but explicitly using G.__class__ seems like it should get most of the troublesome cases. :)

dschult

This looks good to me! :)

rossbar

The changes look good, and thanks for adding tests for the intended use-case, i.e. supporting user-derived subclasses.

I noticed there were no tests of the validation verifying that check_create_using was actually hit on the code paths as intended. I took the liberty of pushing those tests up. For the most part, everything went fine, though there are issues with barabasi_albert, dual_barabasi_albert and random_powerlaw_tree where the validation is not hit. For example, in the barabasi_albert implementation, star_graph(n, create_using) gets called before check_create_using, and star_graph itself doesn't support directed graphs, so a different exception is raised.

We should fix up the issues identified by the new tests. Also, related to the above - I wonder if we shouldn't change the exception type from ValueError to NetworkXError. I understand the desire to use a Python built-in exception, but the majority of exceptions that deal with the "wrong" graph type raise NetworkXError. WDYT?

tillahoffmann · 2024-09-21T01:32:45Z

Great, thank you, @rossbar. I've updated the exceptions to be nx.NetworkXErrors. That also fixes the tests because it no longer matters whether one of the "child" generators validates the directedness of the graph before the "parent" generator can.

…s/misc.py`.

dschult

This looks very helpful! Thanks @tillahoffmann !
And the tests @rossbar pushed up cover more cases.

My comments/suggestions here are coming from a perspective of "how might this work repo-wide?" I think this PR should stay focused on the random_graphs.py module. But I'd like it to be a POC for wider use.

It'd be nice to use check_create_using even before we construct the graph instance. Ideally in the first line of the function, we could process the create_using argument. One trouble is that we don't know if create_using is a class or an instance. So is_directed might require an unused argument or not.

We should be able to get around that issue with some hackery. And I have suggestd some hackery here. :)

The other processing that is commonly done here is to set the default value of the graph constructor. Sometimes we want to set a default type when create_using is None (usually based on an input named directed). Perhaps one way to design check_create_using would be to take an additional input default=nx.Graph and return the class/instance.

I've suggested a revamp of the check_create_using function to:

return the create_using argument updated with a default value if needed.
accept a default kwarg that gets returned if the input is None.
use hackery to check whether the create_using input is directed/multigraph for either a class or an instance.

The result is that we could call the check in the first line of the function, before creating any graphs. The return value can then be used in e.g. nx.empty_graph. I've made suggestions for this usage in one function that doesn't have a default and in another where it needs a default dependent on directed.

networkx/generators/random_graphs.py

networkx/utils/misc.py

More ideas for making this work repo-wide

dschult · 2024-09-29T17:14:25Z

I can make the recently suggested changes above if you don't have time atm. Didn't mean to dump a bunch of stuff on you. What do you think?

rossbar · 2024-09-30T11:58:50Z

Thanks @dschult I'm personally +1 on the idea of modifying check_create_using in the proposed way. I very much like the pattern of being able to do the property check (i.e. directed, multigraph) before creating any graphs as opposed to the create-then-validate workflow. This prevents some of the more subtle code paths that were caught in the last round of testing.

dschult

I have pushed a commit to change the _check_create_using function to include a default parameter and output the create_using object. And I moved the checks to the top of the functions.

I'll fix any failing tests, but hopefully there won't be any and this is ready for merging AFAICT.

rossbar

Thanks @tillahoffmann and @dschult !

dschult

Looks like I didn't make the changes to the duplication.py functions... oops...
I'll get that added soon.

dschult · 2024-10-07T19:08:27Z

Thanks @tillahoffmann (and @rossbar reviews)!
create_using in random_graphs.py and duplication.py is merged!

tillahoffmann · 2024-10-07T19:15:22Z

Thank you for finishing this one, @dschult!

tillahoffmann force-pushed the create_using branch from 9c51cfb to 120a198 Compare July 23, 2024 15:26

dschult added the type: Enhancements label Jul 23, 2024

rossbar reviewed Sep 6, 2024

View reviewed changes

tillahoffmann force-pushed the create_using branch 2 times, most recently from 4eb443e to 6cf8a5a Compare September 7, 2024 15:40

dschult reviewed Sep 7, 2024

View reviewed changes

networkx/generators/random_graphs.py Outdated Show resolved Hide resolved

networkx/generators/random_graphs.py Outdated Show resolved Hide resolved

dschult reviewed Sep 9, 2024

View reviewed changes

dschult previously approved these changes Sep 11, 2024

View reviewed changes

rossbar reviewed Sep 17, 2024

View reviewed changes

tillahoffmann and others added 15 commits September 20, 2024 21:34

Add create_using for random graphs.

48594e3

Move create_using tests to dedicated file.

4e02fa9

Add create_using for duplication divergence generators.

46a64f0

Add kernel_root to test to pass without scipy.

cf7bff9

Apply pre-commit lint changes.

ae1d85a

Fix random_kernel_graph test when scipy is not installed.

c5577ac

Remove redundant tests in test_create_using.py.

7a3d7e9

Change create_using to keyword-only argument.

5b61de2

Add directedness and multigraph checks.

79b8c06

Update docstrings for create_using.

1d60eb9

Refactor assert_directedness_multi to check_create_using in `util…

ce8445c

…s/misc.py`.

Support instances for create_using.

017bf8d

Use class of graph to construct helper graphs in random_shell_graph.

f64c869

Add tests for validation of create_using.

f826cf5

Use nx.NetworkXError instead of ValueError for check_create_using.

1ec1f5f

tillahoffmann force-pushed the create_using branch from d9b143f to 1ec1f5f Compare September 21, 2024 01:34

Pass explicit kernel_root to pass tests without scipy.

541df8f

dschult reviewed Sep 29, 2024

View reviewed changes

move check to top of function by adding default and returning graph type

31c1fe6

dschult approved these changes Oct 3, 2024

View reviewed changes

dschult requested a review from rossbar October 5, 2024 12:35

rossbar approved these changes Oct 7, 2024

View reviewed changes

rossbar added 2 commits October 7, 2024 09:55

ValueError -> NetworkXError in docstrings.

113ef21

Add Raises section to check_create_using docstring.

a14f272

dschult reviewed Oct 7, 2024

View reviewed changes

dschult added 2 commits October 7, 2024 13:49

add check_create_using to duplication.py. tweak error msg

dd2a39e

check error msg

b6aaf33

dschult merged commit 75f2af5 into networkx:main Oct 7, 2024
50 checks passed

jarrodmillman added this to the 3.4 milestone Oct 7, 2024

tillahoffmann deleted the create_using branch October 7, 2024 19:15

rossbar mentioned this pull request Oct 25, 2024

Update _raise_on_directed to work with create_using pos arg #7695

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `create_using` parameter for random graphs. #5672

Add `create_using` parameter for random graphs. #5672

tillahoffmann commented May 28, 2022

dschult commented May 28, 2022 •

edited by rossbar

Loading

tillahoffmann commented May 28, 2022 •

edited

Loading

rossbar left a comment

dschult left a comment

dschult Sep 9, 2024

tillahoffmann Sep 9, 2024

dschult Sep 10, 2024

dschult left a comment

rossbar left a comment

tillahoffmann commented Sep 21, 2024

dschult left a comment

dschult commented Sep 29, 2024

rossbar commented Sep 30, 2024

dschult left a comment

rossbar left a comment

dschult left a comment

dschult commented Oct 7, 2024

tillahoffmann commented Oct 7, 2024

Add create_using parameter for random graphs. #5672

Add create_using parameter for random graphs. #5672

Conversation

tillahoffmann commented May 28, 2022

dschult commented May 28, 2022 • edited by rossbar Loading

tillahoffmann commented May 28, 2022 • edited Loading

rossbar left a comment

Choose a reason for hiding this comment

dschult left a comment

Choose a reason for hiding this comment

dschult Sep 9, 2024

Choose a reason for hiding this comment

tillahoffmann Sep 9, 2024

Choose a reason for hiding this comment

dschult Sep 10, 2024

Choose a reason for hiding this comment

dschult left a comment

Choose a reason for hiding this comment

rossbar left a comment

Choose a reason for hiding this comment

tillahoffmann commented Sep 21, 2024

dschult left a comment

Choose a reason for hiding this comment

dschult commented Sep 29, 2024

rossbar commented Sep 30, 2024

dschult left a comment

Choose a reason for hiding this comment

rossbar left a comment

Choose a reason for hiding this comment

dschult left a comment

Choose a reason for hiding this comment

dschult commented Oct 7, 2024

tillahoffmann commented Oct 7, 2024

Add `create_using` parameter for random graphs. #5672

Add `create_using` parameter for random graphs. #5672

dschult commented May 28, 2022 •

edited by rossbar

Loading

tillahoffmann commented May 28, 2022 •

edited

Loading