-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix from_pandas_edgelist
for MultiGraph given edge_key
#7466
Conversation
Previously, `edge_key=` was not used if `edge_attr` was None!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @eriknw !
Not sure why this hasn't been reported before.
Maybe it is unusual for people to convert to/from pandas without edge_attr.
Or maybe people don't use multiedges in pandas much. And apparently they don't use multiedges without any edge_attr.
I have one suggestion below.
for u, v, k in zip(df[source], df[target], df[edge_key]): | ||
g.add_edge(u, v, k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this use add_edges_from
like the else clause?
g.add_edes_from(zip(df[source], df[target], df[edge_key])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasonable suggestion. I tried that way. I think using add_edges_from
here would be significantly slower:
In [1]: import pandas as pd
In [2]: import networkx as nx
In [3]: L0 = list(range(10000))
In [4]: L1 = list(range(1, 10001))
In [5]: K = list(range(10000, 20000))
In [6]: df = pd.DataFrame({'a': L0, 'b': L1, 'c': K})
In [7]: %%timeit
...: G = nx.MultiGraph()
...: for u, v, k in zip(df['a'], df['b'], df['c']):
...: G.add_edge(u, v, k)
...:
13.1 ms ± 19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: %%timeit
...: G = nx.MultiGraph()
...: G.add_edges_from(zip(df['a'], df['b'], df['c']))
...:
...:
25.9 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [9]: %%timeit
...: G = nx.MultiDiGraph()
...: for u, v, k in zip(df['a'], df['b'], df['c']):
...: G.add_edge(u, v, k)
...:
14.4 ms ± 173 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [10]: %%timeit
...: G = nx.MultiDiGraph()
...: G.add_edges_from(zip(df['a'], df['b'], df['c']))
...:
...:
27.9 ms ± 164 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh... Thanks for that -- I knew there was technical debt in the multi(di)graph add_edges_from
. If I recall correctly, there is a tradeoff between rewriting add_edges_from
for each base class and speed up updates. And there was some discounting of time issues for multiedges since they were a niche feature at the time. Let's get those up to speed!
But that's a different PR. This one is fine as is. Thanks!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…4550) This fixes this: networkx/networkx#7466 We would catch this (and other changes) if we tested networkx nightlies. Someday! I kept the buggy behavior for older versions of networkx in order to match networkx, which feels kinda silly, but also okay-ish. Authors: - Erik Welch (https://github.com/eriknw) - Rick Ratzel (https://github.com/rlratzel) - Ralph Liu (https://github.com/nv-rliu) Approvers: - Rick Ratzel (https://github.com/rlratzel) URL: #4550
In
from_pandas_edgelist
, multigraphs are not correctly supported.Previously,
edge_key=
was not used ifedge_attr
was None!This adds code to handle the
edge_key
parameter even when noedge_attr
are requested.