-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: to speed up rendering of styler #34863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
import numpy as np | ||
|
||
from pandas import DataFrame | ||
|
||
|
||
class RenderApply: | ||
|
||
params = [[12, 24, 36], [12, 120]] | ||
param_names = ["cols", "rows"] | ||
|
||
def setup(self, cols, rows): | ||
self.df = DataFrame( | ||
np.random.randn(rows, cols), | ||
columns=[f"float_{i+1}" for i in range(cols)], | ||
index=[f"row_{i+1}" for i in range(rows)], | ||
) | ||
self._style_apply() | ||
|
||
def time_render(self, cols, rows): | ||
self.st.render() | ||
|
||
def peakmem_apply(self, cols, rows): | ||
self._style_apply() | ||
|
||
def peakmem_render(self, cols, rows): | ||
self.st.render() | ||
|
||
def _style_apply(self): | ||
def _apply_func(s): | ||
return [ | ||
"background-color: lightcyan" if s.name == "row_1" else "" for v in s | ||
] | ||
|
||
self.st = self.df.style.apply(_apply_func, axis=1) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -561,11 +561,15 @@ def _update_ctx(self, attrs: DataFrame) -> None: | |
Whitespace shouldn't matter and the final trailing ';' shouldn't | ||
matter. | ||
""" | ||
for row_label, v in attrs.iterrows(): | ||
for col_label, col in v.items(): | ||
i = self.index.get_indexer([row_label])[0] | ||
j = self.columns.get_indexer([col_label])[0] | ||
for pair in col.rstrip(";").split(";"): | ||
rows = [(row_label, v) for row_label, v in attrs.iterrows()] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you should use itertuples here (its actually much faster) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jeff - that's a good thing to know and I tried it but could not figure out doing the same thing with itertuples. However, it seems that .get_indexer is the one that caused much delay. So real solution should be something that will eliminate get_indexer entirely or some acceleration effort done on get_indexer. I can think of one way to avoid get_indexer -- simply taking index & columns as list and use it to get integer index # of given label. However, I was not sure if I could do that safely because I am not sure all the labels given in attrs always matches that of self.index and self.columns. probably not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ahh i see, you are doing get indexer once for the rows, you can do the same once for the columns. you can throw this in a dict {label -> int}. This will vastly speed up things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So you mean, each row will have same columns ..? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it is certain that we really do not have to use get_indexer method, probably something this should work, outside the loops: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the exact code that worked for my app:
However, it outperform the current patch only slightly with benchmark. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you avoid the append? I think if this was a comprehension (or at least the last append) would be much better There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see how how it can be better. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok i actually like your code above a little better, its very idiomatic and easy to understand. push it up and ping on green. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really don't feel safe with this code. It might break someone's code. |
||
row_idx = self.index.get_indexer([x[0] for x in rows]) | ||
for ii, row in enumerate(rows): | ||
i = row_idx[ii] | ||
cols = [(col_label, col) for col_label, col in row[1].items() if col] | ||
col_idx = self.columns.get_indexer([x[0] for x in cols]) | ||
for jj, itm in enumerate(cols): | ||
j = col_idx[jj] | ||
for pair in itm[1].rstrip(";").split(";"): | ||
self.ctx[(i, j)].append(pair) | ||
|
||
def _copy(self, deepcopy: bool = False) -> "Styler": | ||
|
Uh oh!
There was an error while loading. Please reload this page.