BUG: assign consensus name to index union in array case GH13475 #35338

iamlemec · 2020-07-18T19:53:49Z

closes BUG: index.name not preserved in concat in case of unequal object index #13475
tests added / passed (except known datetime64 issue in BUG: Strange behavior at testing test_ts_plot_with_tz #35080)
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

arw2019

Thanks for the fix!

We will need a test to show how this fixes the bug (based on a quick look pandas/tests/reshape/test_concat.py would be a good place for it)

iamlemec · 2020-07-19T03:37:23Z

Sure thing! Just added in a test ensuring a common index name is preserved with concat.

arw2019

One comment, otherwise lgtm

arw2019 · 2020-07-19T03:40:00Z

pandas/tests/reshape/test_concat.py

+
+        result = pd.concat([frame1, frame2], axis=1)
+
+        assert result.index.name == "idx"


Here you want to hard code the expected result and check equality with tm.assert_frame_equal

ah indeed, that's much better. just pushed a remedy.

arw2019

Ok great!

All that's left is a whatsnew entry. I think this may go to 1.2 so we might need to wait until the 1.1 gets branched to finalize (#34730 & #35315)

I'll ping here when that's done if you like

iamlemec · 2020-07-19T04:36:49Z

Sounds great, thanks!

simonjayhawkins

Thanks @iamlemec for the PR. generally lgtm pending release note.

In the meantime, maybe could parameterize test for other name combinations (different names and missing names as well as same names, may need to concat three dataframes to get better coverage of permutations)

simonjayhawkins · 2020-07-19T10:10:09Z

pandas/core/indexes/api.py

@@ -220,7 +220,8 @@ def conv(i):
        index = indexes[0]
        for other in indexes[1:]:
            if not index.equals(other):
-                return _unique_indices(indexes)
+                index = _unique_indices(indexes)
+                break


just a personal preference, but for me something like

if not all(index.equals(other) for other in indexes[1:]): index = _unique_indices(indexes)

is easier to grok instead of introducing a break.

ah, yup that's much more readable. hadn't realized that all will short-circuit. will add that in plus some additional tests.

iamlemec · 2020-07-20T01:33:36Z

Just added the new tests. I figured since we're primarily testing the index name functionality, it's okay to define the base frame then use rename_axis to differentiate.

jreback

this also needs a whatsnew note, this would be for 1.2 (the whatsnew is not pushed yet so will have to do a bit later)

pandas/tests/reshape/test_concat.py

arw2019 · 2020-07-20T21:36:45Z

pandas/tests/reshape/test_concat.py

@@ -1279,6 +1279,33 @@ def test_concat_ignore_index(self, sort):

        tm.assert_frame_equal(v1, expected)

+    concat_index_names = [


I feel like we would want to put this inside pytest.mark.parametrize rather than defining a separate variable

all integrated now

arw2019 · 2020-07-20T21:39:18Z

pandas/tests/reshape/test_concat.py

+        frames = [pd.DataFrame({c: vals}, index=i) for i, c in zip(indices, cols)]
+        result = pd.concat(frames, axis=1)
+
+        exp_ind = pd.Index(rows, name=output_name)


nit pick: might want to define this inside the frames constructor rather than in a separate variable

indeed, best to keep result / expected separate

arw2019 · 2020-07-29T19:51:26Z

this also needs a whatsnew note, this would be for 1.2 (the whatsnew is not pushed yet so will have to do a bit later)

@iamlemec 1.2 whatsnew is on master now

iamlemec · 2020-07-29T23:58:44Z

Congrats on the release!

Actually, I was going over the testing code one last time, and I realized it doesn't actually test the new behavior (put another way, the current master will pass the test). That's because the bug only kicks in when the indices aren't equal (numerically), but they are equal in the test. I'm going to change it so the indices are only partially overlapping. Should I still also test the case where they are numerically equal?

One more thing, which I think hasn't been brought up explicitly. This fix will also affect the behavior of the DataFrame constructor in the same way it affects concat with axis=1. Not sure if this influences the testing requirements.

jreback · 2020-08-06T23:53:21Z

One more thing, which I think hasn't been brought up explicitly. This fix will also affect the behavior of the DataFrame constructor in the same way it affects concat with axis=1. Not sure if this influences the testing requirements.

can you show an example?

iamlemec · 2020-08-07T05:46:15Z

Sure thing. The original issue only arises when it hits the "array" case of union_indexes, which means we need to send it a plain Index, so I'm using string index labels here:

s1 = pd.Series([1, 2], index=pd.Index(['a', 'b'], name='idx'))
s2 = pd.Series([2, 3], index=pd.Index(['b', 'c'], name='idx'))
pd.DataFrame({'a': s1, 'b': s2})

On master, I'm getting this yielding a DataFrame whose index has no name. With patch, it's named 'idx'.

jreback · 2020-08-07T11:43:17Z

Sure thing. The original issue only arises when it hits the "array" case of union_indexes, which means we need to send it a plain Index, so I'm using string index labels here:
s1 = pd.Series([1, 2], index=pd.Index(['a', 'b'], name='idx'))
s2 = pd.Series([2, 3], index=pd.Index(['b', 'c'], name='idx'))
pd.DataFrame({'a': s1, 'b': s2})
On master, I'm getting this yielding a DataFrame whose index has no name. With patch, it's named 'idx'.

great, can you add this as a test as well; let's put it in the pandas/tests/frame/test_constructors.py, ping on green.

iamlemec · 2020-08-07T20:07:14Z

Sounds good @jreback. Just pushed a DataFrame constructor test.

jreback · 2020-08-07T20:18:03Z

great ping on green

iamlemec · 2020-08-07T22:20:33Z

@jreback ok everything on CI looks good except for test_chunks_have_consistent_numerical_type, but it seems like that one's been flakey for people lately

jreback · 2020-08-07T22:23:03Z

yep that's fine, thanks @iamlemec

arw2019 suggested changes Jul 18, 2020

View reviewed changes

iamlemec force-pushed the union_indexes branch from 4a81c1f to ae16e8c Compare July 19, 2020 03:29

arw2019 suggested changes Jul 19, 2020

View reviewed changes

simonjayhawkins reviewed Jul 19, 2020

View reviewed changes

jreback requested changes Jul 20, 2020

View reviewed changes

pandas/tests/reshape/test_concat.py Outdated Show resolved Hide resolved

jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jul 20, 2020

arw2019 reviewed Jul 20, 2020

View reviewed changes

iamlemec force-pushed the union_indexes branch from 0fbd896 to 242690e Compare July 30, 2020 16:40

jreback added this to the 1.2 milestone Aug 6, 2020

jreback added Bug Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses labels Aug 7, 2020

BUG: assign consensus name to index union in array case GH13475

b8336c6

iamlemec force-pushed the union_indexes branch from 74950c6 to b8336c6 Compare August 7, 2020 20:02

jreback approved these changes Aug 7, 2020

View reviewed changes

jreback merged commit 92bf41a into pandas-dev:master Aug 7, 2020


		result = pd.concat([frame1, frame2], axis=1)

		assert result.index.name == "idx"

		@@ -1279,6 +1279,33 @@ def test_concat_ignore_index(self, sort):

		tm.assert_frame_equal(v1, expected)

		concat_index_names = [

Uh oh!

BUG: assign consensus name to index union in array case GH13475 #35338

BUG: assign consensus name to index union in array case GH13475 #35338

Uh oh!

Conversation

iamlemec commented Jul 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arw2019 left a comment

Choose a reason for hiding this comment

Uh oh!

iamlemec commented Jul 19, 2020

Uh oh!

arw2019 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arw2019 left a comment

Choose a reason for hiding this comment

Uh oh!

iamlemec commented Jul 19, 2020

Uh oh!

simonjayhawkins left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamlemec commented Jul 20, 2020

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arw2019 commented Jul 29, 2020

Uh oh!

iamlemec commented Jul 29, 2020

Uh oh!

jreback commented Aug 6, 2020

Uh oh!

iamlemec commented Aug 7, 2020

Uh oh!

jreback commented Aug 7, 2020

Uh oh!

iamlemec commented Aug 7, 2020

Uh oh!

jreback commented Aug 7, 2020

Uh oh!

iamlemec commented Aug 7, 2020

Uh oh!

jreback commented Aug 7, 2020

Uh oh!

Uh oh!

iamlemec commented Jul 18, 2020 •

edited

Loading