-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: passing DataFrame to make_block silently raises #28275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hmm so you are saying this improves mixed block performance where some can be cythonized right? This adds a little bit of logic so would be good to see that measured somehow to understand if it’s worth it |
I'm having a hard time cooking up an example, but roughly what I have in mind is a case with
In such a case we would end up falling back to the non-cython implementations for all blocks instead of just the one relevant EA block. |
try: | ||
result = s.aggregate(lambda x: alt(x, axis=self.axis)) | ||
except TypeError: | ||
# we may have an exception in trying to aggregate | ||
# continue and exclude the block | ||
deleted_items.append(locs) | ||
continue | ||
|
||
if is_object_dtype(block.dtype) and how in ["prod", "cumprod", "sum"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a way around is to have .aggregate different exceptions for flow control (make some private ones that inherit from NotImplementedError); we do this in the cython aggregator now (well we raise NotImplementedError)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah, distinguishing between our NotImplementedError vs numpy's is basically what #28198 does (albeit that is only in one place). I think the groupby catching/raising is complicated enough that we need to do a few passes clearing up what gets caught where, then go through to do the catching in fewer places
Closing to clear the queue. Will revisit. |
ATM in the
except NotImplementedError:
branch of_cython_agg_blocks
, we take a difference approach to the operation. if that approach succeeds, we end up passing a DataFrame tomake_block
on L189, which will raiseValueError
. As a result, we'll end up falling back to python-space for the entire operation, which presumably entails a performance hit.This fixes the incorrect passing of DataFrame, but the fix is kind of kludgy. Suggestions welcome on how to improve it.