-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Closed
Labels
IndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesMultiIndexPerformanceMemory or execution speed performanceMemory or execution speed performance
Description
Let df0
and df1
be indexed data frames. Assume that at least one of them has a non-unique index. If I'm not mistaken, the function #L309 will always be called upon the operation:
df0.loc[df1.index]
Now imagine that both of indexes are reasonably large (lets say at least one millions of records). The operations at lines
will be of order O(n^2*log(n)^2)
even when both indices are sorted.
Am I missing something? Let me know if the above reasoning is right so I can help with it. If both indexes are sorted, It looks like that function can be as fast as O(n)
simply by looping over the values
and targets
arrays simultaneously. (Assuming that the index duplication is very small compared to n
.
Metadata
Metadata
Assignees
Labels
IndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesMultiIndexPerformanceMemory or execution speed performanceMemory or execution speed performance