Skip to content

Conversation

@SeanMcCarren
Copy link
Contributor

@SeanMcCarren SeanMcCarren commented Sep 10, 2021

Hi! According to some specifications in issue #666, i continued from the work of @rishabhsamb and @romanlutz.

Done/Todo

  • plot_model_comparison function implemented
  • plot_model_comparison numpydoc
  • plot_model_comparison demonstration
  • plot_model_comparison test case (not necessary right?)
  • API entry from numpydoc???
  • More demonstration???

Questions

  • I added a **kwargs to plot_model_comparison and forward this to Axes.scatter(x, y, **kwargs), is this even a good idea? It makes our function even more easy to use, but I did not find a good way to validate all the possible kwargs, and instead if an invalid kwarg is passed a not-super-informative error is given.
  • I think the example I made is a bit boring, any tips?
  • Would it be a nice convention to make a metric type? I guess because there is no type inference and we want to be able to use sklearn functions without having to initialize a metric type for them?
  • How do I add this numpydoc to the API?
  • What stylistic features should the plots have by default? I already thought it would be nice to infer axis labels, but that's it.
  • as y_pred we pass a dictionary with {model_id: predictions}, but nothing has been discussed with what to do with these model_id's. Any suggestions? Should we add another parameter label: bool and if true print model_id-labels alongside datapoints?

rishabhsamb and others added 7 commits July 9, 2021 13:22
Signed-off-by: Rishabh Sambare <rishabhsamb@gmail.com>
The (initial attempt at) the body of plot_model_comparison (probably still missing sanity w.r.t. the x_axis_metric and y_axis_metric... can I just pass sensitive_features always? nothing else required?)

To make it work, added the import in __init__.py

Also created an example in one of the existing notebooks as I believe it is a fitting context!
…numpydoc

- also support extra kwargs which are passed to ax.scatter
- only if no ax is provided, automatically infer axis labels
Example notebook section
User guide section (with literalinclude)
Copy link
Member

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get back to your excellent questions later today! Thanks for all this work!

UPDATE: Done :-)

@romanlutz
Copy link
Member

* [ ]  plot_model_comparison test case (not necessary right?)

There's currently one plotting function that we're testing in fairlearn/test/unit/postprocessing/test_plots.py. That test is only run if matplotlib is installed, though. We have one configuration of tests under fairlearn/test/install that run to check that we're throwing the right error message if you don't install matplotlib and still run the plotting functions. Ideally, I'd love to have both kinds of tests. The "actual" plotting tests wouldn't actually test that the right plot shows up, but rather that it runs without error. Not great, but as we discussed elsewhere (I think in the issue) matplotlib testing isn't super easy.

Copy link
Member

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* What stylistic features should the plots have by default? I already thought it would be nice to infer axis labels, but that's it.

Can you post a screenshot from the doc build on what the plots look like? It's a bit easier to comment for everyone. I'll run it locally myself, but I imagine not everyone does this.

.. _dashboard:


Plotting model comparisons
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* [ ]  More demonstration???

According to the requirements of the project such a change should have a proper docstring and an entry in the user guide. As you've already figured out it's sometimes easiest to add a file (or section to an existing file) and literalinclude it in the user guide with some more description and context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the requirements of the project such a change should have a proper docstring and an entry in the user guide.

Sorry, I'm not quite sure which requirements. Is this from the contributor guide? And do you mean that I should add an example in the docstring of the function definition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I missed this comment before! Sorry for the massive delay 🙁
The contrib guide says

Documentation should be provided with pull requests that add or change functionality. This includes comments in the code itself, docstrings, and user guides.
So with doctoring and user guide you're good. However, since this is a plotting API the output won't show up in the user guide with a normal code block. Instead, you'd need to create an "example" in a python file and then use literalinclude to reference that code and it's outputs. We've done that in https://github.com/fairlearn/fairlearn/blob/main/docs/user_guide/assessment.rst under "Plotting grouped metrics" based on plot_quickstart.py from the examples and then included the figure which is the plot.

I don't actually know if this can be done in the API reference as well. I guess it can, although I wouldn't expect you to experiment with that in this PR. An example in the user guide explaining what this kind of plot is useful for is plenty. Does that make sense?

# %%

# Plot model comparison
plot_model_comparison(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* I think the example I made is a bit boring, any tips?

Hmm, for one I think it's nice that you're building on the existing case rather than copy-pasting without real changes. I don't think it's boring as such, but eventually we're hopefully not using the adult census dataset anymore. #907 will introduce a synthetic dataset for these purposes. As we integrate this model comparison into real world use case examples it'll get much more interesting, of course.

x_axis_metric: Callable[..., Union[float, int]],
y_axis_metric: Callable[..., Union[float, int]],
y_true,
y_preds,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* as `y_pred` we pass a dictionary with `{model_id: predictions}`, but nothing has been discussed with what to do with these `model_id`'s. Any suggestions? Should we add another parameter `label: bool` and if true print `model_id`-labels alongside datapoints?

Would be awesome if we could somehow make it clear which point corresponds to which model, otherwise it's hard to compare them. I can think of two ways: labels next to the plotted points, or color coding the points. There are problems with both methods depending on the use case, of course, but the labels are definitely preferable IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually prefer legends (with color code or symbols), as adding labels next to points can result in quite a crowded plot that's difficult to read - especially if there is only one default setting for distances, font, etc. I'm happy either way though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's something to be said for either option, but I'd leave it up to the user to choose. We could provide a very crowded example with legend, and a less crowded one with labels if we want to go all out here. I will say color-coding also has its limits after about 8-12 colors since they start looking the same even to me (so I can only imagine how hard it is for colorblind folks to distinguish them). So ideally I'd let the user configure it and provide examples for how to configure it. @SeanMcCarren let me know if you don't know how to get these things set up. Happy to help!

@romanlutz romanlutz added API Anything which touches on the API documentation enhancement New feature or request labels Sep 11, 2021
@romanlutz romanlutz changed the title Model comparison plot for issue #666 FEAT: Model comparison plot Sep 11, 2021
@SeanMcCarren
Copy link
Contributor Author

Thanks for all the great feedback! Sorry, I took a while :)

* What stylistic features should the plots have by default? I already thought it would be nice to infer axis labels, but that's it.

Can you post a screenshot from the doc build on what the plots look like? It's a bit easier to comment for everyone. I'll run it locally myself, but I imagine not everyone does this.

Yeah good idea, especially because it takes a while to build. Here it is, no custom styles except for the axis label names (even those names could use some work maybe?)
afbeelding

So ideally I'd let the user configure it and provide examples for how to configure it. @SeanMcCarren let me know if you don't know how to get these things set up. Happy to help!

Yes I'm going to experiment a little! Ideas remain welcome. I'll keep you posted!

There's currently one plotting function that we're testing in fairlearn/test/unit/postprocessing/test_plots.py. That test is only run if matplotlib is installed, though. We have one configuration of tests under fairlearn/test/install that run to check that we're throwing the right error message if you don't install matplotlib and still run the plotting functions. Ideally, I'd love to have both kinds of tests. The "actual" plotting tests wouldn't actually test that the right plot shows up, but rather that it runs without error. Not great, but as we discussed elsewhere (I think in the issue) matplotlib testing isn't super easy.

Ok, so checking if the right error message is thrown when matplotlib is not installed is easy. But now, to make sure I understand correctly... Are the 6 functions in class TestPlots in file test_plots.py actually executed? And, ideally, we also make some testcases similarly to these 6 functions in TestPlots for the model comparison plot?

@romanlutz
Copy link
Member

Ok, so checking if the right error message is thrown when matplotlib is not installed is easy. But now, to make sure I understand correctly... Are the 6 functions in class TestPlots in file test_plots.py actually executed? And, ideally, we also make some testcases similarly to these 6 functions in TestPlots for the model comparison plot?

To my knowledge they are, provided matplotlib and pytest-mpl are installed. Both are part of our requirements files (mpl is in customplots and pytest-mpl is in dev) so you should have them installed. Our pipelines install them before running the tests (with the exception of the installation test which - on purpose - doesn't install mpl). Note that they don't assert anything, they just just do the plotting which really should succeed. It's an extremely weak form of testing.

@romanlutz
Copy link
Member

Yeah good idea, especially because it takes a while to build. Here it is, no custom styles except for the axis label names (even those names could use some work maybe?)

This is exactly what I was thinking of visually, nice!
How hard is it for the user to add the legend or point descriptions now, though? If it's too complicated we could implement both and provide a flag to do either one. Without annotations it's really impossible to tell which model it is.

Regarding axis labels: we know it's a metric function, so I'd just do metric_name.replace('_', ' '). It's a reasonable guess and should be possible to override if users take the returned Axes and just set a different one, right?

@romanlutz
Copy link
Member

See what they're doing with auto-adjusted point labels here? https://www.python-graph-gallery.com/web-scatterplot-text-annotation-and-regression-matplotlib
I'm wondering whether that's an option. Downside: extra dependency ☹️

@hildeweerts
Copy link
Contributor

@romanlutz hmm I'm not sure if I'd like to add an additional dependency just for auto adjusting labels. IMO the plotting functionality in Fairlearn should be as simple as possible - we won't be able to account for all situations, so if people really want something specific they can make adjustments or plot from scratch.

@romanlutz
Copy link
Member

@romanlutz hmm I'm not sure if I'd like to add an additional dependency just for auto adjusting labels. IMO the plotting functionality in Fairlearn should be as simple as possible - we won't be able to account for all situations, so if people really want something specific they can make adjustments or plot from scratch.

I'm fine with that, provided what we outlined earlier works out.

@SeanMcCarren
Copy link
Contributor Author

With regards to the auto labelling, I like it but I also agree that I don't like an extra dependency. I think sort of mimicking their techniques would be feasible, but from a maintainability standpoint that might seems like an even worse idea.

Simply drawing labels on top without changing their coordinate looks OK, and when there are very many points we could only draw every X-th label or a max. amount?
afbeelding

I am not so fond of the figures I get when using a legend or different colors. However, color may work, but not in the sense that every node should get its own color, but perhaps a color gradient in scenarios where models are labelled similarly (as in "Model 1",
"Model 2", ...).
afbeelding
But still this doesn't look very promising, especially if you're supplying models that aren't named nicely (so not "Model 1",
"Model 2", ...), then we may give a different color scale to each. Let me know if you would like me to pursue this.

Another option is to return a dict with model names and coordinates, such that users can draw their own labels. If this is not done, there will be no option for the user to place labels on the image at all. (except if they manually compute the metrics, but then they would not need this function anyway).

I agree with @hildeweerts in the sense that it should be as simple as possible, but on the other hand if we are providing functionality it should function to how one would like to use it, otherwise it is functionality without a use case.

So, that was my rant, what should we do? :)

Copy link
Member

@MiroDudik MiroDudik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just small changes outlined in this review.

If True, add a legend. Must set
:code:`model_kwargs[i]['label']` for every prediction :code:`i`.

legend_kwargs : dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resurfacing this comment. Can we remove legend_kwargs for simplicity?

@SeanMcCarren SeanMcCarren requested review from MiroDudik and removed request for romanlutz September 25, 2022 16:11
Copy link
Member

@MiroDudik MiroDudik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the current API. I just noted a small issue re. renaming the source file.

In addition, please clear the flake8 issue and check with @adrinjalali and @riedgar-ms about the coverage tests.

from ._generated_metrics import _generated_metric_dict # noqa: F401
from ._make_derived_metric import make_derived_metric # noqa: F401
from ._metric_frame import MetricFrame # noqa: F401
from ._multi_curve_plot import plot_model_comparison # noqa: F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rename the file _multi_curve_plot to _plot_model_comparison for clarity?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hildeweerts , @riedgar-ms -- do you have opinion on this? the _multi_curve_plot is probably just a historic leftover...

@hildeweerts
Copy link
Contributor

Hi, @SeanMcCarren! Do you have any idea when you'd be able to make some of the final changes and resolve the flake8 issue? Thanks!

@SeanMcCarren
Copy link
Contributor Author

This PR should be done :) Maybe someone can rerun the tests? Locally, every test succeeded now, but for some reason test_othermlpackages is triggered.

@MiroDudik
Copy link
Member

@riedgar-ms , @hildeweerts -- with the expgrad fix, this should be good to merge. if you agree, please approve and merge.

@riedgar-ms
Copy link
Member

I'm OK to merge this. Missing the codecov patch target isn't great, but it was a tiny miss. @hildeweerts ?

@SeanMcCarren
Copy link
Contributor Author

@riedgar-ms In my defense, L113-114 is actually tested in test_no_matplotlib.py (at least, I think this is actually run?) which should push it above 95% :)

@hildeweerts hildeweerts merged commit 77fcd87 into fairlearn:main Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Anything which touches on the API documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants