Skip to content

Conversation

jbrockmendel
Copy link
Member

@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Index Related to the Index class or subclasses Constructors Series/DataFrame/Index/pd.array Constructors labels Nov 8, 2022
@mroeschke mroeschke added this to the 2.0 milestone Nov 8, 2022
@mroeschke mroeschke merged commit 2713873 into pandas-dev:main Nov 8, 2022
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the api-maybe_convert_objects branch November 8, 2022 19:03
phofl pushed a commit to phofl/pandas that referenced this pull request Nov 9, 2022
* API: Index([NaT, None]) match Series([NaT, None])

* mypy fixup
@phofl
Copy link
Member

phofl commented Dec 16, 2022

It looks like that this caused a slowdown in explode:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.006    0.006 {built-in method builtins.exec}
        1    0.000    0.000    0.006    0.006 <string>:1(<module>)
        1    0.000    0.000    0.006    0.006 series.py:4122(explode)
        1    0.000    0.000    0.005    0.005 series.py:342(__init__)
        1    0.000    0.000    0.005    0.005 construction.py:497(sanitize_array)
        1    0.000    0.000    0.005    0.005 cast.py:1104(maybe_infer_to_datetimelike)
        1    0.005    0.005    0.005    0.005 {pandas._libs.lib.maybe_convert_objects}
        1    0.001    0.001    0.001    0.001 {pandas._libs.reshape.explode}
        1    0.000    0.000    0.000    0.000 base.py:1137(repeat)
        1    0.000    0.000    0.000    0.000 {method 'repeat' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 _methods.py:46(_sum)
        1    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}

https://asv-runner.github.io/asv-collection/pandas/#reshape.Explode.time_explode?p-n_rows=10000&p-max_list_length=10

@jbrockmendel
Copy link
Member Author

any idea what that profile output looks like before this?

@phofl
Copy link
Member

phofl commented Dec 16, 2022

No, but can have a look tomorrow

@phofl
Copy link
Member

phofl commented Dec 17, 2022

This is how it looks before:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.001    0.001 series.py:4169(explode)
        1    0.001    0.001    0.001    0.001 {pandas._libs.reshape.explode}
        1    0.000    0.000    0.000    0.000 series.py:344(__init__)
        1    0.000    0.000    0.000    0.000 base.py:1176(repeat)
        1    0.000    0.000    0.000    0.000 construction.py:497(sanitize_array)
        1    0.000    0.000    0.000    0.000 managers.py:1909(from_array)
        1    0.000    0.000    0.000    0.000 {method 'repeat' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 _methods.py:46(_sum)
        1    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.000    0.000    0.000    0.000 function.py:59(__call__)
        2    0.000    0.000    0.000    0.000 config.py:262(__call__)
        1    0.000    0.000    0.000    0.000 common.py:157(is_object_dtype)
        2    0.000    0.000    0.000    0.000 config.py:134(_get_option)
    24/17    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 _validators.py:168(validate_args_and_kwargs)
        1    0.000    0.000    0.000    0.000 common.py:1486(_is_dtype_type)
       34    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}

and this is after the commit from this pr:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.006    0.006 {built-in method builtins.exec}
        1    0.000    0.000    0.006    0.006 <string>:1(<module>)
        1    0.000    0.000    0.006    0.006 series.py:4169(explode)
        1    0.000    0.000    0.005    0.005 series.py:344(__init__)
        1    0.000    0.000    0.005    0.005 construction.py:497(sanitize_array)
        1    0.000    0.000    0.005    0.005 construction.py:755(_try_cast)
        1    0.000    0.000    0.005    0.005 cast.py:1174(maybe_infer_to_datetimelike)
        1    0.005    0.005    0.005    0.005 {pandas._libs.lib.maybe_convert_objects}
        1    0.001    0.001    0.001    0.001 {pandas._libs.reshape.explode}
        1    0.000    0.000    0.000    0.000 base.py:1176(repeat)
        1    0.000    0.000    0.000    0.000 _methods.py:46(_sum)
        1    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.000    0.000    0.000    0.000 managers.py:1909(from_array)
        1    0.000    0.000    0.000    0.000 function.py:59(__call__)
        1    0.000    0.000    0.000    0.000 _validators.py:168(validate_args_and_kwargs)
        1    0.000    0.000    0.000    0.000 {method 'repeat' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 common.py:157(is_object_dtype)
        1    0.000    0.000    0.000    0.000 numeric.py:289(full)
        1    0.000    0.000    0.000    0.000 base.py:4885(_values)
        2    0.000    0.000    0.000    0.000 config.py:262(__call__)
        1    0.000    0.000    0.000    0.000 common.py:1486(_is_dtype_type)
       34    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        2    0.000    0.000    0.000    0.000 config.py:134(_get_option)
        1    0.000    0.000    0.000    0.000 range.py:192(_data)
    24/17    0.000    0.000    0.000    0.000 {built-in method builtins.len}

@jbrockmendel
Copy link
Member Author

so looks like it is all in maybe_convert_objects. could add a convert_numeric=False flag to allow some short-circuiting in there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Series([pd.NaT, None]) vs Index([pd.NaT, None])
3 participants