-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: assign consensus name to index union in array case GH13475 #35338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
We will need a test to show how this fixes the bug (based on a quick look pandas/tests/reshape/test_concat.py
would be a good place for it)
Sure thing! Just added in a test ensuring a common index name is preserved with concat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, otherwise lgtm
pandas/tests/reshape/test_concat.py
Outdated
|
||
result = pd.concat([frame1, frame2], axis=1) | ||
|
||
assert result.index.name == "idx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you want to hard code the expected result and check equality with tm.assert_frame_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah indeed, that's much better. just pushed a remedy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @iamlemec for the PR. generally lgtm pending release note.
In the meantime, maybe could parameterize test for other name combinations (different names and missing names as well as same names, may need to concat three dataframes to get better coverage of permutations)
pandas/core/indexes/api.py
Outdated
@@ -220,7 +220,8 @@ def conv(i): | |||
index = indexes[0] | |||
for other in indexes[1:]: | |||
if not index.equals(other): | |||
return _unique_indices(indexes) | |||
index = _unique_indices(indexes) | |||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a personal preference, but for me something like
if not all(index.equals(other) for other in indexes[1:]):
index = _unique_indices(indexes)
is easier to grok instead of introducing a break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yup that's much more readable. hadn't realized that all
will short-circuit. will add that in plus some additional tests.
Just added the new tests. I figured since we're primarily testing the index name functionality, it's okay to define the base frame then use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also needs a whatsnew note, this would be for 1.2 (the whatsnew is not pushed yet so will have to do a bit later)
pandas/tests/reshape/test_concat.py
Outdated
@@ -1279,6 +1279,33 @@ def test_concat_ignore_index(self, sort): | |||
|
|||
tm.assert_frame_equal(v1, expected) | |||
|
|||
concat_index_names = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we would want to put this inside pytest.mark.parametrize
rather than defining a separate variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all integrated now
pandas/tests/reshape/test_concat.py
Outdated
frames = [pd.DataFrame({c: vals}, index=i) for i, c in zip(indices, cols)] | ||
result = pd.concat(frames, axis=1) | ||
|
||
exp_ind = pd.Index(rows, name=output_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit pick: might want to define this inside the frames constructor rather than in a separate variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, best to keep result / expected separate
@iamlemec 1.2 whatsnew is on master now |
Congrats on the release! Actually, I was going over the testing code one last time, and I realized it doesn't actually test the new behavior (put another way, the current master will pass the test). That's because the bug only kicks in when the indices aren't equal (numerically), but they are equal in the test. I'm going to change it so the indices are only partially overlapping. Should I still also test the case where they are numerically equal? One more thing, which I think hasn't been brought up explicitly. This fix will also affect the behavior of the DataFrame constructor in the same way it affects concat with axis=1. Not sure if this influences the testing requirements. |
can you show an example? |
Sure thing. The original issue only arises when it hits the "array" case of
On master, I'm getting this yielding a DataFrame whose index has no name. With patch, it's named 'idx'. |
great, can you add this as a test as well; let's put it in the pandas/tests/frame/test_constructors.py, ping on green. |
Sounds good @jreback. Just pushed a DataFrame constructor test. |
great ping on green |
@jreback ok everything on CI looks good except for |
yep that's fine, thanks @iamlemec |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff