-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REGR: Fixes first_valid_index when DataFrame or Series has duplicate row index (GH21441) #21497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
GH21441
pandas/tests/generic/test_generic.py
Outdated
@@ -612,6 +612,16 @@ def test_pct_change(self, periods, fill_method, limit, exp): | |||
else: | |||
tm.assert_series_equal(res, Series(exp)) | |||
|
|||
@pytest.mark.parametrize("DF,idx,first_idx,last_idx", [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some renaming suggestions for readability:
DF
-->data
(matchDataFrame
constructor)idx
-->index
(matchDataFrame
constructor)first_idx
-->expected_first
(follow standard expected/result unit test setup)last_idx
-->expected_last
(follow standard expected/result unit test setup)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - done
pandas/tests/generic/test_generic.py
Outdated
({'A': [1, 2, 3, 4]}, ['d', 'd', 'd', 'd'], 'd', 'd')]) | ||
def test_valid_index(self, DF, idx, first_idx, last_idx): | ||
# GH 21441 | ||
df1 = pd.DataFrame(DF, index=idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just call this df
; there's no ambiguity since there's only one frame in the test. Also DataFrame
is imported, so the pd.
isn't needed.
doc/source/whatsnew/v0.23.2.txt
Outdated
|
||
**Other Fixes** | ||
|
||
- Bug in :meth:`first_valid_index` that raised for row index with duplicate values (:issue:`21441`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be :meth:`DataFrame.first_valid_index`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need a separate sub-section here, just list the issue
'raised for a row index'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - updated
Have left it as :meth:first_valid_index
as this issue affects both DataFrame and Series (though the example and title of the original issue points only to DataFrame)
doc/source/whatsnew/v0.23.2.txt
Outdated
|
||
**Other Fixes** | ||
|
||
- Bug in :meth:`first_valid_index` that raised for row index with duplicate values (:issue:`21441`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need a separate sub-section here, just list the issue
'raised for a row index'
pandas/core/generic.py
Outdated
if not is_valid[i]: | ||
return None | ||
return i | ||
i = is_valid.values[::].argmin() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just call this idxpos, no need for i any longer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - done
pandas/core/generic.py
Outdated
return None | ||
return i | ||
i = is_valid.values[::].argmin() | ||
idxpos = i | ||
|
||
elif how == 'last': | ||
# Last valid value case | ||
i = is_valid.values[::-1].argmax() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this idxpos
pandas/tests/test_resample.py
Outdated
@@ -649,13 +649,6 @@ def test_asfreq_fill_value(self): | |||
expected = frame.reindex(new_index, fill_value=4.0) | |||
assert_frame_equal(result, expected) | |||
|
|||
def test_resample_interpolate(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this test as it was failing - investigated and seems that this test is from a closed PR #12974 opened for issue #12925
Not sure if this is the right call....
probably there are other tests failing which I haven't investigated yet - my sense is those might be related to this one - will check the TravisCI and other checks for it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored this test now - this test along with others were failing due to error in interpolation, which is fixed now
GH21441
doc/source/whatsnew/v0.23.2.txt
Outdated
@@ -16,7 +16,7 @@ and bug fixes. We recommend that all users upgrade to this version. | |||
Fixed Regressions | |||
~~~~~~~~~~~~~~~~~ | |||
|
|||
- | |||
- Bug in :meth:`first_valid_index` raised for a row index with duplicate values (:issue:`21441`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have left it as :meth:`first_valid_index` as this issue affects both DataFrame and Series
I don't think the :meth: will correctly link to anything as written since there's no global pd.first_valid_index
. You can write something like "Bug in both :meth:`Series.first_valid_index` and :meth:`DataFrame.first_valid_index` ..." if you want to be explicit that both are affected, which would link to both Series
and DataFrame
separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - Done
Codecov Report
@@ Coverage Diff @@
## master #21497 +/- ##
==========================================
- Coverage 91.92% 91.91% -0.01%
==========================================
Files 153 153
Lines 49570 49574 +4
==========================================
+ Hits 45566 45568 +2
- Misses 4004 4006 +2
Continue to review full report at Codecov.
|
pandas/core/generic.py
Outdated
return None | ||
return self.index[len(self) - i - 1] | ||
idx = is_valid.idxmax() | ||
if isinstance(is_valid[idx], ABCSeries): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are you trying to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - this block is supposed to check that for multiple counts of same index, at least one is not NA.
However, while testing this with following data, the expected output is not being returned
x = pd.DataFrame({'b': [1,np.NaN,3]}, index=[1,1,2])
Expected 1
, returned None
I'll rework this patch and commit again - Thanks again for the question prompt, it was fallacy of assumption on my part (had not checked explicitly for NaN value among the multiple index)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the loop was incorrect leading to an error, not sure what I was thinking earlier :) - fixed now and committing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - rebased and committed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not clear on the logic here, why can't this be a mirror of the 'last' logic?
pandas/tests/generic/test_generic.py
Outdated
({'A': [1, 2, 3]}, [1, 1, 2], 1, 2), | ||
({'A': [1, 2, 3]}, [1, 2, 2], 1, 2), | ||
({'A': [1, 2, 3, 4]}, ['d', 'd', 'd', 'd'], 'd', 'd')]) | ||
def test_valid_index(self, data, index, expected_first, expected_last): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we not already have some tests for this? pls put near the others. does this duplicate existing tests at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we not already have some tests for this? pls put near the others.
Thanks - The only test involving first_valid_index
and last_valid_index
is in ./pandas/tests/frame/test_timeseries.py
- and does not specifically check for duplicate
first or last index values. Would you suggest I move this test there?
…uplicate index GH21441
will rebase and resolve conflict with whatsnew file and push later today |
GH21441
GH21441
…uplicate index GH21441
GH21441
GH21441
({'A': [np.nan, np.nan, 3]}, [1, 1, 2], 2, 2), | ||
({'A': [1, np.nan, 3]}, [1, 2, 2], 1, 2)]) | ||
def test_first_last_valid(self, data, index, | ||
expected_first, expected_last): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved tests here from pandas/tests/generic/test_generic.py
- all related tests for first_valid_index
and last_valid_index
are co-located
pandas/core/generic.py
Outdated
return None | ||
return self.index[len(self) - i - 1] | ||
idx = is_valid.idxmax() | ||
if isinstance(is_valid[idx], ABCSeries): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not clear on the logic here, why can't this be a mirror of the 'last' logic?
thanks @KalyanGokhale |
…row index (GH21441) (pandas-dev#21497)
git diff upstream/master -u -- "*.py" | flake8 --diff