Skip to content

Groupby performance discrepency in timestamped data #817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adamklein opened this issue Feb 23, 2012 · 0 comments
Closed

Groupby performance discrepency in timestamped data #817

adamklein opened this issue Feb 23, 2012 · 0 comments
Assignees
Milestone

Comments

@adamklein
Copy link
Contributor

From mailing list

serie = pandas.io.parsers.read_csv(f, parse_dates=True,
date_parser=dateParser, index_col=0)
dateRange = pandas.DateRange(start, end, offset=5 *
pandas.datetools.Minute())
grouped = serie.groupby(dateRange.asof)

version 1

t = time()
for date in serie.index:
k = grouped.grouper(date) # returns the key of the group where
date belongs
g = grouped.get_group(k)
print time()-t

version 2

t = time()
for date in serie.index:
k = grouped.grouper(date)
g = serie.ix[grouped.groups[k]]
print time()-t

serie is something looking like this (financial data indexed with
datetime)
<class 'pandas.core.frame.DataFrame'>
Index: 476640 entries, 2011-01-03 00:00:00 to 2011-11-29 23:59:00
Data columns:
Open 476640 non-null values
High 476640 non-null values
Low 476640 non-null values
Close 476640 non-null values
Volume 476640 non-null values
dtypes: float64(5)

For 100.000 elts, version 1 performs in 480 secs, while version 2
takes only 25 secs
For the full 460.000 elts, we then get about 40 mins and 135 secs

@ghost ghost assigned wesm Feb 23, 2012
@wesm wesm closed this as completed in 3cd1b05 Feb 24, 2012
@wesm wesm reopened this Feb 24, 2012
wesm added a commit that referenced this issue Feb 24, 2012
@wesm wesm closed this as completed Feb 24, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants