Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix from_pandas_edgelist for MultiGraph given edge_key #7466

Merged
merged 1 commit into from
May 31, 2024

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented May 28, 2024

In from_pandas_edgelist, multigraphs are not correctly supported.
Previously, edge_key= was not used if edge_attr was None!
This adds code to handle the edge_key parameter even when no edge_attr are requested.

Previously, `edge_key=` was not used if `edge_attr` was None!
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @eriknw !
Not sure why this hasn't been reported before.
Maybe it is unusual for people to convert to/from pandas without edge_attr.
Or maybe people don't use multiedges in pandas much. And apparently they don't use multiedges without any edge_attr.

I have one suggestion below.

Comment on lines +436 to +437
for u, v, k in zip(df[source], df[target], df[edge_key]):
g.add_edge(u, v, k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use add_edges_from like the else clause?

            g.add_edes_from(zip(df[source], df[target], df[edge_key])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable suggestion. I tried that way. I think using add_edges_from here would be significantly slower:

In [1]: import pandas as pd

In [2]: import networkx as nx

In [3]: L0 = list(range(10000))

In [4]: L1 = list(range(1, 10001))

In [5]: K = list(range(10000, 20000))

In [6]: df = pd.DataFrame({'a': L0, 'b': L1, 'c': K})

In [7]: %%timeit
   ...: G = nx.MultiGraph()
   ...: for u, v, k in zip(df['a'], df['b'], df['c']):
   ...:     G.add_edge(u, v, k)
   ...:
13.1 ms ± 19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8]: %%timeit
   ...: G = nx.MultiGraph()
   ...: G.add_edges_from(zip(df['a'], df['b'], df['c']))
   ...:
   ...:
25.9 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: %%timeit
   ...: G = nx.MultiDiGraph()
   ...: for u, v, k in zip(df['a'], df['b'], df['c']):
   ...:     G.add_edge(u, v, k)
   ...:

14.4 ms ± 173 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: %%timeit
    ...: G = nx.MultiDiGraph()
    ...: G.add_edges_from(zip(df['a'], df['b'], df['c']))
    ...:
    ...:
27.9 ms ± 164 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh... Thanks for that -- I knew there was technical debt in the multi(di)graph add_edges_from. If I recall correctly, there is a tradeoff between rewriting add_edges_from for each base class and speed up updates. And there was some discounting of time issues for multiedges since they were a niche feature at the time. Let's get those up to speed!

But that's a different PR. This one is fine as is. Thanks!!

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eriknw and @dschult !

@rossbar rossbar merged commit 6eabce4 into networkx:main May 31, 2024
42 of 43 checks passed
@jarrodmillman jarrodmillman added this to the 3.4 milestone May 31, 2024
rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Jul 30, 2024
…4550)

This fixes this: networkx/networkx#7466

We would catch this (and other changes) if we tested networkx nightlies. Someday!

I kept the buggy behavior for older versions of networkx in order to match networkx, which feels kinda silly, but also okay-ish.

Authors:
  - Erik Welch (https://github.com/eriknw)
  - Rick Ratzel (https://github.com/rlratzel)
  - Ralph Liu (https://github.com/nv-rliu)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4550
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants