Skip to content

Inconsistent kwargs argument 'color' passed to upstream matplotlib plot functions #31691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xuancong84 opened this issue Feb 5, 2020 · 14 comments · Fixed by #38387
Closed

Inconsistent kwargs argument 'color' passed to upstream matplotlib plot functions #31691

xuancong84 opened this issue Feb 5, 2020 · 14 comments · Fixed by #38387

Comments

@xuancong84
Copy link

In matplotlib.pyplot.bar , there is a keyword argument called 'color' which can control the color of all bars as well as each bar, e.g.,
image

However, in Pandas DataFrame.plot.bar, by passing a list into 'color', the color of all bars is controlled only by the 1st element in the list, i.e.,
image

Ironically, if we pass in a list of list into 'color', we can control the color of each bar, i.e.,
image

So my question is why the behavior of the 'color' argument different from that in matplotlib? Is this intended inconsistency?

@charlesdong1991
Copy link
Member

charlesdong1991 commented Feb 5, 2020

thanks for posting @xuancong84

probably this could be clarified/better defined, but this is because color is defined per column when you are using pd.DataFrame to plot, so in your case, since the values are all in the same column, they will all get the first color set by color, and color[0] in your example is red because you have only one column. If doing below, all three colors are assigned:

pd.DataFrame({"v1": [3,6,2], "v2": [2,1,5], "v3": [4,5,6]}).plot(kind="bar", color=["r", "b", "y"])

If you would like to make your example work, then you should use pd.Series to plot:

pd.Series([3, 6, 2]).plot(kind="bar", color=["r", "b", "y"])

I think you will get the desired figure through the code above.

@xuancong84
Copy link
Author

Thanks @charlesdong1991 , does that mean if we use pd.DataFrame.plot.bar to plot, it actually calls pyplot.bar several times, once for each column? Thus, all kwargs have to be packed into a list?

@charlesdong1991
Copy link
Member

@xuancong84 indeed, ax.bar will be called three times in this case

all kwargs have to be packed into a list

I am not sure what you mean by all kwargs, I haven't double checked, but i remember there are some postprocessings on color, but for some arguments, i think they will be shared by all, e.g. rot, right?

@xuancong84
Copy link
Author

xuancong84 commented Feb 6, 2020

@charlesdong1991 , thanks for your info. I refer all kwargs to those additional keyword arguments such as color=, ax=, rot=, etc., that will be passed to upstream matplotlib plot functions.

Since not all of these arguments will be unpacked before passing to matplotlib, to avoid this confusion, I would like to suggest that: for all those arguments that will be unpacked, add a prefix or suffix to distinguish, e.g., instead of color, name it color_packed, color_list, multi_color or packed_color, etc. So that users will know that this keyword argument will be unpacked and passed to upstream matplotlib plot function, one for each column of the DataFrame. Alternatively, we can create both versions of keyword argument, e.g., for color, in DataFrame.plot, if you pass color=, it will apply to every column, but if you pass color_list=, it will unpack and apply each list item to the corresponding column. What do you think?

@xuancong84 xuancong84 changed the title Inconsistent kwargs argument 'color' passed upstream to matplotlib Inconsistent kwargs argument 'color' passed to upstream matplotlib plot functions Feb 6, 2020
@charlesdong1991
Copy link
Member

thanks for your suggestion @xuancong84 i think renaming the argument/having new similar argument is an API change and might cause confusion to users, especially if this only works for pd.DataFrame.plot.

I think the easiest way is to have a better docstring for color argument, and clarify those scenarios.

@TomAugspurger might have better opinions on it?

@xuancong84
Copy link
Author

Thanks @charlesdong1991 ! Yup, revising the documentation for color in pd.DataFrame.plot is the minimum needed to be done. But whether to apply this convention to every matplotlib keyword argument and even to every function, depends on whether there exists such a need. Pandas developers should examine and evaluate on this carefully in order to make a better decision.

@mfenner1
Copy link

mfenner1 commented Apr 9, 2020

There are similar issues with pd.Series.plot. For example:

import pandas as pd
import matplotlib
print(pd.__version__) # ---> 1.0.3
print(matplotlib.__version__) # ---> 3.1.3

# ok, line plot
pd.Series([5, 10, 20]).plot(color='r')

# ok, red dots 
pd.Series([5, 10, 20]).plot(style='.', color='r')

# fails
pd.Series([5, 10, 20]).plot(style='o', color='r')

# the following only applies the first color (red)
# --> all three points are red
pd.Series([5, 10, 20]).plot(style='.', color=['r', 'g', 'b'])

Withimport matplotlib.pyplot as plt:

# works
plt.plot([5, 10, 20], marker='o', color='r')

# fails
plt.plot([5, 10, 20], marker='o', color=['r', 'g', 'b'])

@MarcoGorelli
Copy link
Member

Hi @xuancong84

Yup, revising the documentation for color in pd.DataFrame.plot is the minimum needed to be done

Are you interested in submitting a PR?

@xuancong84
Copy link
Author

Are you interested in submitting a PR?
Sorry, what is PR?

@MarcoGorelli
Copy link
Member

PR = Pull Request, see contributing to pandas

@ankushduacodes
Copy link
Contributor

Hi @MarcoGorelli, I would love to contribute to this issue. Please let me know if I may :)

@MarcoGorelli
Copy link
Member

of course!

@ankushduacodes
Copy link
Contributor

@MarcoGorelli As This will be my first contribution to pandas repo, Could you please give some pointer on this issue, I would really appreciate that.

@MarcoGorelli
Copy link
Member

@ankushduacodes for a start, read through the contributing guide linked above.

After that, I think what needs to be done to close this issue is to reword the docstring for DataFrame.plot.bar. Currently, for color, it reads:

code, which will be used for each column recursively. For instance [‘green’,’yellow’] each column’s bar will be filled in green or yellow, alternatively.

This can probably be clarified, maybe by noting that if you only have a single column, then only the first colour in the list will be used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants