FEAT: Model comparison plot #947

SeanMcCarren · 2021-09-10T13:19:52Z

Hi! According to some specifications in issue #666, i continued from the work of @rishabhsamb and @romanlutz.

Done/Todo

plot_model_comparison function implemented
plot_model_comparison numpydoc
plot_model_comparison demonstration
plot_model_comparison test case (not necessary right?)
API entry from numpydoc???
More demonstration???

Questions

I added a **kwargs to plot_model_comparison and forward this to Axes.scatter(x, y, **kwargs), is this even a good idea? It makes our function even more easy to use, but I did not find a good way to validate all the possible kwargs, and instead if an invalid kwarg is passed a not-super-informative error is given.
I think the example I made is a bit boring, any tips?
Would it be a nice convention to make a metric type? I guess because there is no type inference and we want to be able to use sklearn functions without having to initialize a metric type for them?
How do I add this numpydoc to the API?
What stylistic features should the plots have by default? I already thought it would be nice to infer axis labels, but that's it.
as y_pred we pass a dictionary with {model_id: predictions}, but nothing has been discussed with what to do with these model_id's. Any suggestions? Should we add another parameter label: bool and if true print model_id-labels alongside datapoints?

Signed-off-by: Rishabh Sambare <rishabhsamb@gmail.com>

…factored-input

The (initial attempt at) the body of plot_model_comparison (probably still missing sanity w.r.t. the x_axis_metric and y_axis_metric... can I just pass sensitive_features always? nothing else required?) To make it work, added the import in __init__.py Also created an example in one of the existing notebooks as I believe it is a fitting context!

…numpydoc - also support extra kwargs which are passed to ax.scatter - only if no ax is provided, automatically infer axis labels

Example notebook section User guide section (with literalinclude)

romanlutz

I'll get back to your excellent questions later today! Thanks for all this work!

UPDATE: Done :-)

docs/user_guide/assessment.rst

fairlearn/metrics/_multi_curve_plot.py

romanlutz · 2021-09-11T14:15:35Z

* [ ]  plot_model_comparison test case (not necessary right?)

There's currently one plotting function that we're testing in fairlearn/test/unit/postprocessing/test_plots.py. That test is only run if matplotlib is installed, though. We have one configuration of tests under fairlearn/test/install that run to check that we're throwing the right error message if you don't install matplotlib and still run the plotting functions. Ideally, I'd love to have both kinds of tests. The "actual" plotting tests wouldn't actually test that the right plot shows up, but rather that it runs without error. Not great, but as we discussed elsewhere (I think in the issue) matplotlib testing isn't super easy.

romanlutz

* What stylistic features should the plots have by default? I already thought it would be nice to infer axis labels, but that's it.

Can you post a screenshot from the doc build on what the plots look like? It's a bit easier to comment for everyone. I'll run it locally myself, but I imagine not everyone does this.

fairlearn/metrics/_multi_curve_plot.py

romanlutz · 2021-09-11T14:17:24Z

docs/user_guide/assessment.rst

 .. _dashboard:

+
+Plotting model comparisons


* [ ] More demonstration???

According to the requirements of the project such a change should have a proper docstring and an entry in the user guide. As you've already figured out it's sometimes easiest to add a file (or section to an existing file) and literalinclude it in the user guide with some more description and context.

According to the requirements of the project such a change should have a proper docstring and an entry in the user guide.

Sorry, I'm not quite sure which requirements. Is this from the contributor guide? And do you mean that I should add an example in the docstring of the function definition?

Not sure how I missed this comment before! Sorry for the massive delay 🙁
The contrib guide says

Documentation should be provided with pull requests that add or change functionality. This includes comments in the code itself, docstrings, and user guides.
So with doctoring and user guide you're good. However, since this is a plotting API the output won't show up in the user guide with a normal code block. Instead, you'd need to create an "example" in a python file and then use literalinclude to reference that code and it's outputs. We've done that in https://github.com/fairlearn/fairlearn/blob/main/docs/user_guide/assessment.rst under "Plotting grouped metrics" based on plot_quickstart.py from the examples and then included the figure which is the plot.

I don't actually know if this can be done in the API reference as well. I guess it can, although I wouldn't expect you to experiment with that in this PR. An example in the user guide explaining what this kind of plot is useful for is plenty. Does that make sense?

fairlearn/metrics/_multi_curve_plot.py

romanlutz · 2021-09-11T14:21:26Z

examples/plot_grid_search_census.py

+# %%
+
+# Plot model comparison
+plot_model_comparison(


* I think the example I made is a bit boring, any tips?

Hmm, for one I think it's nice that you're building on the existing case rather than copy-pasting without real changes. I don't think it's boring as such, but eventually we're hopefully not using the adult census dataset anymore. #907 will introduce a synthetic dataset for these purposes. As we integrate this model comparison into real world use case examples it'll get much more interesting, of course.

fairlearn/metrics/_multi_curve_plot.py

romanlutz · 2021-09-11T14:24:50Z

fairlearn/metrics/_multi_curve_plot.py

+    x_axis_metric: Callable[..., Union[float, int]],
+    y_axis_metric: Callable[..., Union[float, int]],
+    y_true,
+    y_preds,


* as `y_pred` we pass a dictionary with `{model_id: predictions}`, but nothing has been discussed with what to do with these `model_id`'s. Any suggestions? Should we add another parameter `label: bool` and if true print `model_id`-labels alongside datapoints?

Would be awesome if we could somehow make it clear which point corresponds to which model, otherwise it's hard to compare them. I can think of two ways: labels next to the plotted points, or color coding the points. There are problems with both methods depending on the use case, of course, but the labels are definitely preferable IMO.

I usually prefer legends (with color code or symbols), as adding labels next to points can result in quite a crowded plot that's difficult to read - especially if there is only one default setting for distances, font, etc. I'm happy either way though.

I think there's something to be said for either option, but I'd leave it up to the user to choose. We could provide a very crowded example with legend, and a less crowded one with labels if we want to go all out here. I will say color-coding also has its limits after about 8-12 colors since they start looking the same even to me (so I can only imagine how hard it is for colorblind folks to distinguish them). So ideally I'd let the user configure it and provide examples for how to configure it. @SeanMcCarren let me know if you don't know how to get these things set up. Happy to help!

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

…en/fairlearn into model_comparison_plot

SeanMcCarren · 2021-09-15T12:59:53Z

Thanks for all the great feedback! Sorry, I took a while :)

* What stylistic features should the plots have by default? I already thought it would be nice to infer axis labels, but that's it.
Can you post a screenshot from the doc build on what the plots look like? It's a bit easier to comment for everyone. I'll run it locally myself, but I imagine not everyone does this.

Yeah good idea, especially because it takes a while to build. Here it is, no custom styles except for the axis label names (even those names could use some work maybe?)

So ideally I'd let the user configure it and provide examples for how to configure it. @SeanMcCarren let me know if you don't know how to get these things set up. Happy to help!

Yes I'm going to experiment a little! Ideas remain welcome. I'll keep you posted!

There's currently one plotting function that we're testing in fairlearn/test/unit/postprocessing/test_plots.py. That test is only run if matplotlib is installed, though. We have one configuration of tests under fairlearn/test/install that run to check that we're throwing the right error message if you don't install matplotlib and still run the plotting functions. Ideally, I'd love to have both kinds of tests. The "actual" plotting tests wouldn't actually test that the right plot shows up, but rather that it runs without error. Not great, but as we discussed elsewhere (I think in the issue) matplotlib testing isn't super easy.

Ok, so checking if the right error message is thrown when matplotlib is not installed is easy. But now, to make sure I understand correctly... Are the 6 functions in class TestPlots in file test_plots.py actually executed? And, ideally, we also make some testcases similarly to these 6 functions in TestPlots for the model comparison plot?

romanlutz · 2021-09-15T18:30:40Z

Ok, so checking if the right error message is thrown when matplotlib is not installed is easy. But now, to make sure I understand correctly... Are the 6 functions in class TestPlots in file test_plots.py actually executed? And, ideally, we also make some testcases similarly to these 6 functions in TestPlots for the model comparison plot?

To my knowledge they are, provided matplotlib and pytest-mpl are installed. Both are part of our requirements files (mpl is in customplots and pytest-mpl is in dev) so you should have them installed. Our pipelines install them before running the tests (with the exception of the installation test which - on purpose - doesn't install mpl). Note that they don't assert anything, they just just do the plotting which really should succeed. It's an extremely weak form of testing.

romanlutz · 2021-09-15T18:35:56Z

Yeah good idea, especially because it takes a while to build. Here it is, no custom styles except for the axis label names (even those names could use some work maybe?)

This is exactly what I was thinking of visually, nice!
How hard is it for the user to add the legend or point descriptions now, though? If it's too complicated we could implement both and provide a flag to do either one. Without annotations it's really impossible to tell which model it is.

Regarding axis labels: we know it's a metric function, so I'd just do metric_name.replace('_', ' '). It's a reasonable guess and should be possible to override if users take the returned Axes and just set a different one, right?

romanlutz · 2021-09-18T21:55:50Z

See what they're doing with auto-adjusted point labels here? https://www.python-graph-gallery.com/web-scatterplot-text-annotation-and-regression-matplotlib
I'm wondering whether that's an option. Downside: extra dependency ☹️

hildeweerts · 2021-09-20T13:49:14Z

@romanlutz hmm I'm not sure if I'd like to add an additional dependency just for auto adjusting labels. IMO the plotting functionality in Fairlearn should be as simple as possible - we won't be able to account for all situations, so if people really want something specific they can make adjustments or plot from scratch.

romanlutz · 2021-09-20T14:03:07Z

@romanlutz hmm I'm not sure if I'd like to add an additional dependency just for auto adjusting labels. IMO the plotting functionality in Fairlearn should be as simple as possible - we won't be able to account for all situations, so if people really want something specific they can make adjustments or plot from scratch.

I'm fine with that, provided what we outlined earlier works out.

SeanMcCarren · 2021-09-24T11:36:52Z

With regards to the auto labelling, I like it but I also agree that I don't like an extra dependency. I think sort of mimicking their techniques would be feasible, but from a maintainability standpoint that might seems like an even worse idea.

Simply drawing labels on top without changing their coordinate looks OK, and when there are very many points we could only draw every X-th label or a max. amount?

I am not so fond of the figures I get when using a legend or different colors. However, color may work, but not in the sense that every node should get its own color, but perhaps a color gradient in scenarios where models are labelled similarly (as in "Model 1",
"Model 2", ...).

But still this doesn't look very promising, especially if you're supplying models that aren't named nicely (so not "Model 1",
"Model 2", ...), then we may give a different color scale to each. Let me know if you would like me to pursue this.

Another option is to return a dict with model names and coordinates, such that users can draw their own labels. If this is not done, there will be no option for the user to place labels on the image at all. (except if they manually compute the metrics, but then they would not need this function anyway).

I agree with @hildeweerts in the sense that it should be as simple as possible, but on the other hand if we are providing functionality it should function to how one would like to use it, otherwise it is functionality without a use case.

So, that was my rant, what should we do? :)

MiroDudik

Just small changes outlined in this review.

examples/plot_grid_search_census.py

fairlearn/metrics/_multi_curve_plot.py

MiroDudik · 2022-09-21T19:28:33Z

fairlearn/metrics/_multi_curve_plot.py

+        If True, add a legend. Must set
+        :code:`model_kwargs[i]['label']` for every prediction :code:`i`.
+
+    legend_kwargs : dict


Resurfacing this comment. Can we remove legend_kwargs for simplicity?

fairlearn/metrics/_multi_curve_plot.py

fairlearn/utils/_input_validation.py

Co-authored-by: MiroDudik <mdudik@gmail.com>

…en/fairlearn into model_comparison_plot

MiroDudik

I'm fine with the current API. I just noted a small issue re. renaming the source file.

In addition, please clear the flake8 issue and check with @adrinjalali and @riedgar-ms about the coverage tests.

MiroDudik · 2022-09-26T20:46:52Z

fairlearn/metrics/__init__.py

 from ._generated_metrics import _generated_metric_dict  # noqa: F401
 from ._make_derived_metric import make_derived_metric  # noqa: F401
 from ._metric_frame import MetricFrame  # noqa: F401
+from ._multi_curve_plot import plot_model_comparison  # noqa: F401


We could rename the file _multi_curve_plot to _plot_model_comparison for clarity?

@hildeweerts , @riedgar-ms -- do you have opinion on this? the _multi_curve_plot is probably just a historic leftover...

hildeweerts · 2022-09-29T15:14:21Z

Hi, @SeanMcCarren! Do you have any idea when you'd be able to make some of the final changes and resolve the flake8 issue? Thanks!

fairlearn/metrics/_multi_curve_plot.py

Co-authored-by: Richard Edgar <riedgar@microsoft.com>

SeanMcCarren · 2022-10-05T12:08:41Z

This PR should be done :) Maybe someone can rerun the tests? Locally, every test succeeded now, but for some reason test_othermlpackages is triggered.

MiroDudik · 2022-10-11T14:35:24Z

@riedgar-ms , @hildeweerts -- with the expgrad fix, this should be good to merge. if you agree, please approve and merge.

riedgar-ms · 2022-10-11T16:16:13Z

I'm OK to merge this. Missing the codecov patch target isn't great, but it was a tiny miss. @hildeweerts ?

SeanMcCarren · 2022-10-11T17:47:46Z

@riedgar-ms In my defense, L113-114 is actually tested in test_no_matplotlib.py (at least, I think this is actually run?) which should push it above 95% :)

rishabhsamb and others added 7 commits July 9, 2021 13:22

refactored input

3bd9c64

Signed-off-by: Rishabh Sambare <rishabhsamb@gmail.com>

Merge branch 'main' of https://github.com/fairlearn/fairlearn into re…

1d52146

…factored-input

Implemented plot_model_comparison as specified in fairlearn#666 with …

49d97ce

…numpydoc - also support extra kwargs which are passed to ax.scatter - only if no ax is provided, automatically infer axis labels

Fixed small bugs

28026f5

Start documentation

fe73bf2

Example notebook section User guide section (with literalinclude)

rewording

3e02e08

romanlutz reviewed Sep 11, 2021

View reviewed changes

romanlutz added API Anything which touches on the API documentation enhancement New feature or request labels Sep 11, 2021

romanlutz changed the title ~~Model comparison plot for issue #666~~ FEAT: Model comparison plot Sep 11, 2021

SeanMcCarren and others added 9 commits September 15, 2021 13:13

Update docs/user_guide/assessment.rst

107706a

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

Update fairlearn/metrics/_multi_curve_plot.py

a28a8a2

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

Update fairlearn/metrics/_multi_curve_plot.py

bd68503

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

Added :end-before: for safety with literalinclude

935eef0

Typechecking y_preds and not showing plot

8545c1f

Testcase for comparison plot when there is no matplotlib installed

19e1887

Removed show_plot in User Guide example

729393e

remove show_plot here also

e5448ba

Merge branch 'model_comparison_plot' of https://github.com/SeanMcCarr…

a52a169

…en/fairlearn into model_comparison_plot

MiroDudik requested changes Sep 21, 2022

View reviewed changes

SeanMcCarren and others added 8 commits September 25, 2022 16:25

Apply suggestions from code review

d4aab92

Co-authored-by: MiroDudik <mdudik@gmail.com>

simplified selection rate difference

7c29961

ask for list instead of tuple

5dcb5d7

make backwards compatible

2c03315

Update fairlearn/metrics/_multi_curve_plot.py

4bd5819

Co-authored-by: MiroDudik <mdudik@gmail.com>

Remove legend kwargs

12134e0

Merge branch 'model_comparison_plot' of https://github.com/SeanMcCarr…

87607ac

…en/fairlearn into model_comparison_plot

Merge branch 'main' into model_comparison_plot

d0773b9

SeanMcCarren requested review from MiroDudik and removed request for romanlutz September 25, 2022 16:11

MiroDudik approved these changes Sep 26, 2022

View reviewed changes

Merge branch 'main' into model_comparison_plot

b7070fc

riedgar-ms reviewed Sep 29, 2022

View reviewed changes

fairlearn/metrics/_multi_curve_plot.py Outdated Show resolved Hide resolved

hildeweerts and others added 5 commits September 29, 2022 17:21

Update fairlearn/metrics/_multi_curve_plot.py

0806547

Co-authored-by: Richard Edgar <riedgar@microsoft.com>

refactor renaming

843ca15

Merge branch 'main' into model_comparison_plot

3e79413

Testing errors

3a6b59e

Remove test for legend_kwargs as we removed legend_kwargs as arg

84aa24d

Merge branch 'main' into model_comparison_plot

4451b73

Merge branch 'main' into model_comparison_plot

46626dc

riedgar-ms requested a review from hildeweerts October 11, 2022 16:16

Remove unnecessary comment & some final test coverage

2787c47

hildeweerts merged commit 77fcd87 into fairlearn:main Oct 12, 2022

romanlutz mentioned this pull request Jun 20, 2023

Add model comparison plot #666

Closed

FEAT: Model comparison plot #947

FEAT: Model comparison plot #947

Uh oh!

Conversation

SeanMcCarren commented Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Done/Todo

Questions

Uh oh!

romanlutz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

romanlutz commented Sep 11, 2021

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SeanMcCarren commented Sep 15, 2021

Uh oh!

romanlutz commented Sep 15, 2021

Uh oh!

romanlutz commented Sep 15, 2021

Uh oh!

romanlutz commented Sep 18, 2021

Uh oh!

hildeweerts commented Sep 20, 2021

Uh oh!

romanlutz commented Sep 20, 2021

Uh oh!

SeanMcCarren commented Sep 24, 2021

Uh oh!

MiroDudik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MiroDudik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hildeweerts commented Sep 29, 2022

Uh oh!

Uh oh!

SeanMcCarren commented Oct 5, 2022

Uh oh!

SeanMcCarren commented Sep 10, 2021 •

edited

Loading

romanlutz left a comment •

edited

Loading