-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CLN: get_flattened_iterator #35515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: get_flattened_iterator #35515
Conversation
pandas/core/sorting.py
Outdated
# provide "flattened" iterator for multi-group setting | ||
mapper = _KeyMapper(comp_ids, ngroups, levels, labels) | ||
return [mapper.get_key(i) for i in range(ngroups)] | ||
def get_flattened_iterator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good cleanup here, but how about make this get_flattened_list and not need to cast in the calling function?
LGTM |
pandas/core/sorting.py
Outdated
table.map(comp_ids, labs.astype(np.int64, copy=False)) | ||
tables.append(table) | ||
return [ | ||
tuple(level[table.get_item(i)] for table, level in zip(tables, levels)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any performance difference in creating an intermediary list to store the result of zip(tables, levels)
rather than doing it only the fly in each iteration here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Eliminates a loop iteration and storing these table objects
tables = [] | ||
for labs, level in zip(labels, levels): | ||
table = hashtable.Int64HashTable(ngroups) | ||
table.map(comp_ids, labs.astype(np.int64, copy=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could make this a list-comprehension, maybe it would be slightly less readable though
pandas/core/sorting.py
Outdated
) -> List[Tuple]: | ||
"""Map compressed group id -> key tuple.""" | ||
comp_ids = comp_ids.astype(np.int64, copy=False) | ||
arrays: Dict[int, List] = defaultdict(list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrays: Dict[int, List] = defaultdict(list) | |
arrays: DefaultDict[int, List] = defaultdict(list) |
This exists in the typing module. Can the List
be List[int]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the
List
beList[int]
?
pandas/core/sorting.py
Outdated
mapper = _KeyMapper(comp_ids, ngroups, levels, labels) | ||
return [mapper.get_key(i) for i in range(ngroups)] | ||
def get_flattened_list( | ||
comp_ids: np.ndarray, ngroups: int, levels: List["Index"], labels: List[np.ndarray] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from https://github.com/python/typeshed/blob/master/CONTRIBUTING.md
avoid invariant collection types (List, Dict) in argument positions, in favor of covariant types like Mapping or Sequence;
in this case both levels
and labels
are used in zip
, so Iterable
is more permissive.
can you also run the groupby asvs (in particular the ones that exercise the full int64 cardinaility) to assure no perf diffs. |
This small benchmark shows no performance degradation:
I'm getting this error when trying to run the asvs locally
|
If on a Mac I think you need to change the Python version in the ASV conf for it to compile. I think there’s an open issue to update that
…Sent from my iPhone
On Aug 6, 2020, at 10:10 AM, Matthew Roeschke ***@***.***> wrote:
This small benchmark shows no performance degradation:
In [1]: df = pd.DataFrame({'a': list(range(1000)) * 2, 'b': list(range(1000)) * 2})
In [4]: %timeit df.groupby('a').sum()
1.06 ms ± 74.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) <--PR
In [4]: %timeit df.groupby('a').sum()
1.1 ms ± 71.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) <-- master
I'm getting this error when trying to run the asvs locally
STDERR -------->
/Users/matthewroeschke/pandas-mroeschke/asv_bench/env/a36d556a9d1611aba4092dc48036d261/lib/python3.6/site-packages/setuptools/distutils_patch.py:26: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
"Distutils was imported before Setuptools. This usage is discouraged "
warning: pandas/_libs/groupby.pyx:1130:26: Unreachable code
pandas/_libs/parsers.c:49607:52: error: code will never be executed [-Werror,-Wunreachable-code]
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^~~~
pandas/_libs/parsers.c:49607:38: note: silence by adding parentheses to mark code as explicitly dead
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^
/* DISABLES CODE */ ( )
1 error generated.
pandas/_libs/ops_dispatch.c:3877:52: error: code will never be executed [-Werror,-Wunreachable-code]
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^~~~
pandas/_libs/ops_dispatch.c:3877:38: note: silence by adding parentheses to mark code as explicitly dead
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^
/* DISABLES CODE */ ( )
1 error generated.
pandas/_libs/tslibs/offsets.c:89935:52: error: code will never be executed [-Werror,-Wunreachable-code]
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^~~~
pandas/_libs/tslibs/offsets.c:89935:38: note: silence by adding parentheses to mark code as explicitly dead
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^
/* DISABLES CODE */ ( )
1 error generated.
pandas/_libs/tslibs/parsing.c:39517:52: error: code will never be executed [-Werror,-Wunreachable-code]
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^~~~
pandas/_libs/tslibs/parsing.c:39517:38: note: silence by adding parentheses to mark code as explicitly dead
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^
/* DISABLES CODE */ ( )
1 error generated.
pandas/_libs/tslibs/period.c:44813:52: error: code will never be executed [-Werror,-Wunreachable-code]
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^~~~
pandas/_libs/tslibs/period.c:44813:38: note: silence by adding parentheses to mark code as explicitly dead
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^
/* DISABLES CODE */ ( )
1 error generated.
pandas/_libs/tslibs/timedeltas.c:43722:52: error: code will never be executed [-Werror,-Wunreachable-code]
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^~~~
pandas/_libs/tslibs/timedeltas.c:43722:38: note: silence by adding parentheses to mark code as explicitly dead
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^
/* DISABLES CODE */ ( )
1 error generated.
pandas/_libs/tslibs/timestamps.c:41544:52: error: code will never be executed [-Werror,-Wunreachable-code]
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^~~~
pandas/_libs/tslibs/timestamps.c:41544:38: note: silence by adding parentheses to mark code as explicitly dead
} else if (PY_VERSION_HEX >= 0x030700A0 && flag == (METH_FASTCALL | METH_KEYWORDS)) {
^
/* DISABLES CODE */ ( )
1 error generated.
error: command 'gcc' failed with exit status 1
·· Failed to build the project and import the benchmark suite.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
thanks @mroeschke |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Refactor to just a single function