Skip to content

BUG: do not hard-code matplotlib backend for boxplot #28159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jakevdp
Copy link
Contributor

@jakevdp jakevdp commented Aug 26, 2019

No description provided.

@WillAyd WillAyd added the Visualization plotting label Aug 26, 2019
@WillAyd
Copy link
Member

WillAyd commented Aug 26, 2019

@datapythonista

@WillAyd WillAyd added this to the 1.0 milestone Aug 26, 2019
@datapythonista
Copy link
Member

This was implemented in #27432, and was intentional. My preference was to delegate everything to the backends, but the final decision was to just delegate the plots called via accessor.

I'm happy with changing it for boxplot, even if I'd personally deprecate DataFrame.boxplot() in favor of DataFrame.plot(kind='box').

@TomAugspurger thoughts on this change?

@TomAugspurger
Copy link
Contributor

As long as delegating doesn’t get in the way of a future deprecation, then I’m fine with this.

@jakevdp
Copy link
Contributor Author

jakevdp commented Aug 27, 2019

OK - what threw me off was that it's the only hard-coded backend in that file – the rest use whatever the user has set. Is there a reason for that discrepancy?

@datapythonista
Copy link
Member

My understanding is that there are these ways of calling a boxplot from pandas:

df.plot.box()
df.plot(kind='box')
df.boxplot()
pandas.plotting.boxplot(df)

The decision was to delegate to the backend anything that is not a Series or DataFrame method. In this case the 4 cases are implemented in _core.py, but the last one is the only that doesn't delegate.

I think we can merge this if it affects you. Otherwise I'd say better don't touch it, and focus on deprecating it. There was an initial discussion on deprecations in #26747, but the scope of the issue was too big and no decision was made on that. I guess deprecating df.hist() and df.boxplot() is controversial, but I guess people there will be agreement on pandas.plotting.boxplot.

I'll try to open an issue later today to make a decision on that.

@jakevdp
Copy link
Contributor Author

jakevdp commented Aug 27, 2019

Makes sense – so from the contents of this file it appears that plotting backends can override pandas.plotting.hist_series, pandas.plotting.hist_frame, pandas.plotting.boxplot_frame, and pandas.plotting.boxplot_frame_groupby, but not pandas.plotting.boxplot. Was that your intent?

@jakevdp
Copy link
Contributor Author

jakevdp commented Aug 27, 2019

(I'm fine either way, I just think consistency here would make things less confusing for both users and backend authors)

@jakevdp
Copy link
Contributor Author

jakevdp commented Aug 27, 2019

Note also that the accessor is structured such that df.plot.hist() and df.plot.box() delegate to the backend via a separate code path, as reflected in the current implementation in https://github.com/altair-viz/altair_pandas/blob/master/altair_pandas/_core.py. Is that intentional?

@datapythonista
Copy link
Member

Makes sense – so from the contents of this file it appears that plotting backends can override pandas.plotting.hist_series, pandas.plotting.hist_frame, pandas.plotting.boxplot_frame, and pandas.plotting.boxplot_frame_groupby, but not pandas.plotting.boxplot. Was that your intent?

pandas.plotting.hist_series... are not expected to be called directly afaik, there are just the functions that are called when Series.hist()... are called.

There are many things in the current API that I think are explained by how things were implemented, and that are inconsistent. That's why I mentioned that I think what we should do is have a discussion on what to deprecate, and leave things simple, standardized and without inconsistencies.

Let's see what can be done, I think once there is agreement in what should be done, should be fast to implement the changes.

@jakevdp
Copy link
Contributor Author

jakevdp commented Aug 27, 2019

there are just the functions that are called when Series.hist()... are called.

I don't think that's correct. From my read, when you call Series.hist() with the matplotlib backend, you get this code path: https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/_matplotlib/hist.py#L17

@datapythonista
Copy link
Member

I think you're right for Series.plot.hist(), but not for Series.hist(). See here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L4804

It's confusing, that's exactly why I think we should deprecate Series.hist (or just make an alias for Series.plot.hist, but I think alias also cause confusion).

@jakevdp
Copy link
Contributor Author

jakevdp commented Aug 27, 2019

OK – thanks for the clarification.

So would you recommend plotting backends define their own version of hist_series, hist_dataframe, boxplot_series, and boxplot_dataframe, or is the fact that they are able to be overridden (and boxplot is not) just an unintended consequence of an internal implementation?

@datapythonista
Copy link
Member

If you want users to be able to plot with your backend when calling Series.hist()... you should implement them. I'd just call plot(kind='hist') there. I think there are subtle differences among the two, but we should take care of those in our end.

Making you implement those is not great, sorry. They won't be needed in the future, I think. But for now this is how the API works.

@jakevdp
Copy link
Contributor Author

jakevdp commented Aug 27, 2019

OK.

For what it's worth, if you're planning to deprecate them I think it would make more sense to have them produce the matplotlib plot in all cases (like boxplot and other top-level functions currently do).

The current behavior looks like this:

>>> pd.set_option('plotting.backend', 'altair_pandas')
>>> data = pd.DataFrame({'x': range(5)})
>>> pd.plotting.boxplot(data)
# produces matplotlib plot

>>> pd.plotting.boxplot_frame(data)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-48-0bc94aa14454> in <module>
      1 pd.set_option('plotting.backend', 'altair_pandas')
      2 data = pd.DataFrame({'x': range(5)})
----> 3 pd.plotting.boxplot_frame(data)

~/anaconda/lib/python3.6/site-packages/pandas/plotting/_core.py in boxplot_frame(self, column, by, ax, fontsize, rot, grid, figsize, layout, return_type, **kwds)
    407 ):
    408     plot_backend = _get_plot_backend()
--> 409     return plot_backend.boxplot_frame(
    410         self,
    411         column=column,

AttributeError: module 'altair_pandas' has no attribute 'boxplot_frame'

@jakevdp
Copy link
Contributor Author

jakevdp commented Aug 27, 2019

I've updated the PR to reflect how I think it should look given this new information.

Honestly, I'm fine either way as long as a consistent choice is made.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not my area of expertise but lgtm to keep consistent. @datapythonista

@jakevdp
Copy link
Contributor Author

jakevdp commented Sep 5, 2019

I'm going to close this, because the discussion is larger than the files I've touched here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants