Feat/scatter by size #20572

VincentAntoine · 2018-04-01T16:58:22Z

closes part of Scatter plot with colour_by and size_by variables #16827 : makes bubble plots easy with df.plot.scatter(x='col1', y='col2', s='col3') with nice automatic bubble sizing and bubble size legend
2 tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff

…umn_name' argument

VincentAntoine · 2018-04-01T16:59:57Z

I just rebased what I did a few months ago on top of master. Let me know!
Vincent

WillAyd

I'll admit that I'm not terribly familiar with graphing so this is a little out of my area but figured I could regardless give some feedback.

Thanks for submitting - generally the biggest room for improvement in the documentation. It's not entirely clear how this new feature is to be used, so updating doc strings and providing sample usage will go a long way.

WillAyd · 2018-04-02T05:03:57Z

pandas/plotting/_core.py

@@ -829,11 +831,30 @@ def _post_plot_logic(self, ax, data):
 class ScatterPlot(PlanePlot):
    _kind = 'scatter'

-    def __init__(self, data, x, y, s=None, c=None, **kwargs):
+    def __init__(self, data, x, y, s=None, c=None, size_factor=1, **kwargs):


If we are adding a new keyword here then there should be some documentation updates to go along with it - can you take a look at where that needs to be made and bundle it in accordingly?

WillAyd · 2018-04-02T05:08:29Z

pandas/plotting/_core.py

            s = 20
+        elif is_hashable(s) and s in data.columns:


What does the hashable check do for us here?

The hashability check ensures backward compatibility: it is possible (in the current stable version of pandas) to pass an array or a Series of sizes to s. For instance you may do something like:

df.plot.scatter(x='height', y='weight', s=df['price'])

Without hashability check, if s in data.columns throws an error if s is an array or Series. The hashability check ensures that if s in data.columns will only be run if s is a legitimate column label.
If there is a better way to handle this, let me know.

So what happens here if a series is passed into the method? I feel like there is something missing from the if...else logic but I could be wrong

WillAyd · 2018-04-02T05:14:20Z

pandas/plotting/_core.py

+            # Handle the case where s is a label of a column of the df.
+            # The data is normalized to 200 * size_factor.
+            size_data = data[s]
+            if is_categorical_dtype(size_data):


Was this a discussion in an issue elsewhere? Personally it seems strange to me to leverage the category codes for any kind of sizing as that's not what I would think is in the scope of their services but again that's just my opinion

This was a suggestion from @TomAugspurger here:
#17582

But I have to admit that I cannot think of a use case where it would be appropriate to represent category codes by sizes.

WillAyd · 2018-04-02T05:17:16Z

pandas/plotting/_core.py

+        Returns mantissa and exponent of the number passed in argument.
+        Example:
+        >>> _sci_notation(89278.8924)
+        (8.9, 5.0)


This returns (8.9, 4.0) for me - minor typo?

You're right.

WillAyd · 2018-04-02T05:20:03Z

pandas/plotting/_core.py

+        (8.9, 5.0)
+        """
+        scientific_notation = '{:e}'.format(num)
+        expnt = float(re.search(r'e([+-]\d*)$',


Hmm I was hoping there would be a better way to do this than using re but couldn't find anything better on a Google search myself as I'm sure you already tried... With that said, any way to do this in one search and return the appropriately matched groups instead of doing two passes?

Indeed I did the same and could not find a better way.
I modified as you suggest to make only one regexp search.

I'd also suggest putting it as a nested function where it is actually used (i.e. in _legend_bubbles). Even though it's private I don't think this should be an instance method

WillAyd · 2018-04-02T05:20:49Z

pandas/plotting/_core.py

+    def _legend_bubbles(self, s_data_max, size_factor, bubble_points):
+        """
+        Computes and returns appropriate bubble sizes and labels for the
+        legend of a  bubble plot. Creates 4 bubbles with round values for the


May just be my lack of knowledge, but why is it creating 4 bubbles? Is that true all of the time, even if they only have say three groups of data?

If bubble sizes represent categories, you're right, there should be as many bubbles as there categories. But as said above I can't think of a use case of this, so before coding this, it would be nice to make sure it's actually useful. @TomAugspurger do you remember what you had in mind ?

WillAyd · 2018-04-02T05:21:12Z

pandas/plotting/_core.py

+        }
+        for lower_bound, upper_bound in labels_catalog:
+            if (coef >= lower_bound) & (coef < upper_bound):
+                labels = 10**expnt * np.array(labels_catalog[lower_bound,


What happens if expnt is negative?

It works, no problem.

WillAyd · 2018-04-02T05:22:25Z

pandas/plotting/_core.py

+            (0, 1.5): [1, 0.5, 0.25, 0.1]
+        }
+        for lower_bound, upper_bound in labels_catalog:
+            if (coef >= lower_bound) & (coef < upper_bound):


Any reason you chose the bitwise operator instead of the logical and operator? While they get you the same place here the latter is more idiomatic

No reason, I change it :)

WillAyd · 2018-04-02T05:23:31Z

pandas/plotting/_core.py

+        labels, the largest of which is close to the maximum of the data.
+        """
+        coef, expnt = self._sci_notation(s_data_max)
+        labels_catalog = {


Where were these values taken from? Some comments may be helpful

The values were defined by testing until I got something visually pleasing, similar to this graph with R in the original issue submitted here:
#16827

WillAyd · 2018-04-02T05:25:48Z

pandas/tests/plotting/test_frame.py

+        tm.assert_numpy_array_equal(bubble_sizes, expected_sizes)
+
+    @pytest.mark.slow
+    def test_plot_scatter_with_categorical_s(self):


Does missing data affect the categorical display? Perhaps not if the label for NA is -1 and you add 1 in the code above, but it would be good to have a test to explicitly make sure we are OK

pep8speaks · 2018-04-04T21:59:50Z

Hello @VincentAntoine! Thanks for updating the PR.

In the file pandas/plotting/_core.py, following are the PEP8 issues :

Line 3337:44: E241 multiple spaces after ','
Line 3422:80: E501 line too long (88 > 79 characters)

codecov · 2018-04-04T21:59:59Z

Codecov Report

Merging #20572 into master will decrease coverage by 0.09%.
The diff coverage is 13.95%.

@@            Coverage Diff            @@
##           master   #20572     +/-   ##
=========================================
- Coverage   91.84%   91.74%   -0.1%     
=========================================
  Files         152      152             
  Lines       49264    49306     +42     
=========================================
- Hits        45246    45237      -9     
- Misses       4018     4069     +51

Flag	Coverage Δ
#multiple	`90.13% <13.95%> (-0.1%)`	⬇️
#single	`41.87% <9.3%> (-0.03%)`	⬇️

Impacted Files	Coverage Δ
pandas/plotting/_core.py	`80.43% <13.95%> (-2.08%)`	⬇️
pandas/plotting/_converter.py	`65.07% <0%> (-1.74%)`	⬇️
pandas/util/testing.py	`84.52% <0%> (-0.21%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a77ac2b...7112759. Read the comment docs.

TomAugspurger · 2018-07-06T22:30:14Z

CI is failing.

Can you check the failures (some linting issues) and update?

WillAyd · 2018-08-18T23:14:15Z

Supplemented by #22403

VincentAntoine added 7 commits April 1, 2018 18:39

Grab and normalize bubble size data

339aa59

Add possibility to make scatter plot by size on DataFrame with s='col…

33177e0

…umn_name' argument

Add test for scatter plot with s argument

8e87e24

Change the order of arguments in scatter plot

984d494

Remove hashability check in argument parsing

9737ca1

Accept categorical data for s argument

69e6662

PEP8

e46414f

Typo correction

0b6e975

WillAyd requested changes Apr 2, 2018

View reviewed changes

VincentAntoine added 2 commits April 4, 2018 22:37

Cleaner _sci_notation function

c9cafa1

Update docstrings for scatter plot with s parameter

7112759

gfyoung requested a review from TomAugspurger April 10, 2018 04:30

gfyoung added the Visualization plotting label Apr 10, 2018

VincentAntoine mentioned this pull request Aug 17, 2018

Feat/bubble plot #22403

Closed

5 tasks

WillAyd closed this Aug 18, 2018

Uh oh!

Feat/scatter by size #20572

Feat/scatter by size #20572

Uh oh!

Conversation

VincentAntoine commented Apr 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VincentAntoine commented Apr 1, 2018

Uh oh!

WillAyd left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd Apr 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd Apr 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented Apr 4, 2018

Uh oh!

codecov bot commented Apr 4, 2018

Codecov Report

Uh oh!

TomAugspurger commented Jul 6, 2018

Uh oh!

WillAyd commented Aug 18, 2018

Uh oh!

Uh oh!

VincentAntoine commented Apr 1, 2018 •

edited

Loading

WillAyd left a comment •

edited

Loading

WillAyd Apr 2, 2018 •

edited

Loading

WillAyd Apr 2, 2018 •

edited

Loading