-
Notifications
You must be signed in to change notification settings - Fork 478
FEAT: Model comparison plot #947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Model comparison plot #947
Conversation
Signed-off-by: Rishabh Sambare <rishabhsamb@gmail.com>
…factored-input
The (initial attempt at) the body of plot_model_comparison (probably still missing sanity w.r.t. the x_axis_metric and y_axis_metric... can I just pass sensitive_features always? nothing else required?) To make it work, added the import in __init__.py Also created an example in one of the existing notebooks as I believe it is a fitting context!
…numpydoc - also support extra kwargs which are passed to ax.scatter - only if no ax is provided, automatically infer axis labels
Example notebook section User guide section (with literalinclude)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll get back to your excellent questions later today! Thanks for all this work!
UPDATE: Done :-)
There's currently one plotting function that we're testing in |
romanlutz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* What stylistic features should the plots have by default? I already thought it would be nice to infer axis labels, but that's it.
Can you post a screenshot from the doc build on what the plots look like? It's a bit easier to comment for everyone. I'll run it locally myself, but I imagine not everyone does this.
docs/user_guide/assessment.rst
Outdated
| .. _dashboard: | ||
|
|
||
|
|
||
| Plotting model comparisons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* [ ] More demonstration???
According to the requirements of the project such a change should have a proper docstring and an entry in the user guide. As you've already figured out it's sometimes easiest to add a file (or section to an existing file) and literalinclude it in the user guide with some more description and context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the requirements of the project such a change should have a proper docstring and an entry in the user guide.
Sorry, I'm not quite sure which requirements. Is this from the contributor guide? And do you mean that I should add an example in the docstring of the function definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how I missed this comment before! Sorry for the massive delay 🙁
The contrib guide says
Documentation should be provided with pull requests that add or change functionality. This includes comments in the code itself, docstrings, and user guides.
So with doctoring and user guide you're good. However, since this is a plotting API the output won't show up in the user guide with a normal code block. Instead, you'd need to create an "example" in a python file and then useliteralincludeto reference that code and it's outputs. We've done that in https://github.com/fairlearn/fairlearn/blob/main/docs/user_guide/assessment.rst under "Plotting grouped metrics" based onplot_quickstart.pyfrom the examples and then included thefigurewhich is the plot.
I don't actually know if this can be done in the API reference as well. I guess it can, although I wouldn't expect you to experiment with that in this PR. An example in the user guide explaining what this kind of plot is useful for is plenty. Does that make sense?
| # %% | ||
|
|
||
| # Plot model comparison | ||
| plot_model_comparison( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* I think the example I made is a bit boring, any tips?
Hmm, for one I think it's nice that you're building on the existing case rather than copy-pasting without real changes. I don't think it's boring as such, but eventually we're hopefully not using the adult census dataset anymore. #907 will introduce a synthetic dataset for these purposes. As we integrate this model comparison into real world use case examples it'll get much more interesting, of course.
| x_axis_metric: Callable[..., Union[float, int]], | ||
| y_axis_metric: Callable[..., Union[float, int]], | ||
| y_true, | ||
| y_preds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* as `y_pred` we pass a dictionary with `{model_id: predictions}`, but nothing has been discussed with what to do with these `model_id`'s. Any suggestions? Should we add another parameter `label: bool` and if true print `model_id`-labels alongside datapoints?
Would be awesome if we could somehow make it clear which point corresponds to which model, otherwise it's hard to compare them. I can think of two ways: labels next to the plotted points, or color coding the points. There are problems with both methods depending on the use case, of course, but the labels are definitely preferable IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually prefer legends (with color code or symbols), as adding labels next to points can result in quite a crowded plot that's difficult to read - especially if there is only one default setting for distances, font, etc. I'm happy either way though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's something to be said for either option, but I'd leave it up to the user to choose. We could provide a very crowded example with legend, and a less crowded one with labels if we want to go all out here. I will say color-coding also has its limits after about 8-12 colors since they start looking the same even to me (so I can only imagine how hard it is for colorblind folks to distinguish them). So ideally I'd let the user configure it and provide examples for how to configure it. @SeanMcCarren let me know if you don't know how to get these things set up. Happy to help!
Co-authored-by: Roman Lutz <romanlutz13@gmail.com>
Co-authored-by: Roman Lutz <romanlutz13@gmail.com>
Co-authored-by: Roman Lutz <romanlutz13@gmail.com>
…en/fairlearn into model_comparison_plot
|
Thanks for all the great feedback! Sorry, I took a while :)
Yeah good idea, especially because it takes a while to build. Here it is, no custom styles except for the axis label names (even those names could use some work maybe?)
Yes I'm going to experiment a little! Ideas remain welcome. I'll keep you posted!
Ok, so checking if the right error message is thrown when matplotlib is not installed is easy. But now, to make sure I understand correctly... Are the 6 functions in class |
To my knowledge they are, provided |
|
See what they're doing with auto-adjusted point labels here? https://www.python-graph-gallery.com/web-scatterplot-text-annotation-and-regression-matplotlib |
|
@romanlutz hmm I'm not sure if I'd like to add an additional dependency just for auto adjusting labels. IMO the plotting functionality in Fairlearn should be as simple as possible - we won't be able to account for all situations, so if people really want something specific they can make adjustments or plot from scratch. |
I'm fine with that, provided what we outlined earlier works out. |
|
With regards to the auto labelling, I like it but I also agree that I don't like an extra dependency. I think sort of mimicking their techniques would be feasible, but from a maintainability standpoint that might seems like an even worse idea. Simply drawing labels on top without changing their coordinate looks OK, and when there are very many points we could only draw every X-th label or a max. amount? I am not so fond of the figures I get when using a legend or different colors. However, color may work, but not in the sense that every node should get its own color, but perhaps a color gradient in scenarios where models are labelled similarly (as in "Model 1", Another option is to return a dict with model names and coordinates, such that users can draw their own labels. If this is not done, there will be no option for the user to place labels on the image at all. (except if they manually compute the metrics, but then they would not need this function anyway). I agree with @hildeweerts in the sense that it should be as simple as possible, but on the other hand if we are providing functionality it should function to how one would like to use it, otherwise it is functionality without a use case. So, that was my rant, what should we do? :) |
MiroDudik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just small changes outlined in this review.
| If True, add a legend. Must set | ||
| :code:`model_kwargs[i]['label']` for every prediction :code:`i`. | ||
|
|
||
| legend_kwargs : dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resurfacing this comment. Can we remove legend_kwargs for simplicity?
Co-authored-by: MiroDudik <mdudik@gmail.com>
Co-authored-by: MiroDudik <mdudik@gmail.com>
…en/fairlearn into model_comparison_plot
MiroDudik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the current API. I just noted a small issue re. renaming the source file.
In addition, please clear the flake8 issue and check with @adrinjalali and @riedgar-ms about the coverage tests.
fairlearn/metrics/__init__.py
Outdated
| from ._generated_metrics import _generated_metric_dict # noqa: F401 | ||
| from ._make_derived_metric import make_derived_metric # noqa: F401 | ||
| from ._metric_frame import MetricFrame # noqa: F401 | ||
| from ._multi_curve_plot import plot_model_comparison # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could rename the file _multi_curve_plot to _plot_model_comparison for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hildeweerts , @riedgar-ms -- do you have opinion on this? the _multi_curve_plot is probably just a historic leftover...
|
Hi, @SeanMcCarren! Do you have any idea when you'd be able to make some of the final changes and resolve the flake8 issue? Thanks! |
Co-authored-by: Richard Edgar <riedgar@microsoft.com>
|
This PR should be done :) Maybe someone can rerun the tests? Locally, every test succeeded now, but for some reason test_othermlpackages is triggered. |
|
@riedgar-ms , @hildeweerts -- with the expgrad fix, this should be good to merge. if you agree, please approve and merge. |
|
I'm OK to merge this. Missing the codecov patch target isn't great, but it was a tiny miss. @hildeweerts ? |
|
@riedgar-ms In my defense, L113-114 is actually tested in |
Hi! According to some specifications in issue #666, i continued from the work of @rishabhsamb and @romanlutz.
Done/Todo
Questions
metrictype? I guess because there is no type inference and we want to be able to use sklearn functions without having to initialize ametrictype for them?y_predwe pass a dictionary with{model_id: predictions}, but nothing has been discussed with what to do with thesemodel_id's. Any suggestions? Should we add another parameterlabel: booland if true printmodel_id-labels alongside datapoints?