Skip to content

PERF: merge on monotonic keys #56523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 17, 2023
Merged

Conversation

lukemanley
Copy link
Member

Improves performance when merging on sorted keys. Seeing some improvements in non-sorted keys as well by allowing left/right indexers to be None.

Scaled down version of the example in #56115:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({"key": np.arange(0, 1_000_000, 2), "val1": 1})
df2 = pd.DataFrame({"key": np.arange(500_000, 700_000, 1), "val2": 2})

%timeit df = pd.merge_ordered(df1, df2, on="key", how="inner")

# 389 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   <- main
# 10 ms ± 319 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  <- PR

%timeit df = pd.merge(df1, df2, on="key", how="inner")

# 173 ms ± 2.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)    <- main
# 10.7 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)  <- pr

ASVs:

> asv continuous -f 1.1 upstream/main merge-monotonic -b ^join_merge


| Change   | Before [d77d5e54] <main>   | After [765c4026] <merge-monotonic>   |   Ratio | Benchmark (Parameter)                                                                       |
|----------|----------------------------|--------------------------------------|---------|---------------------------------------------------------------------------------------------|
| -        | 22.3±1ms                   | 19.9±0.7ms                           |    0.9  | join_merge.MergeDatetime.time_merge(('ms', 'ms'), None, False)                              |
| -        | 22.6±0.9ms                 | 19.0±0.5ms                           |    0.84 | join_merge.MergeDatetime.time_merge(('ms', 'ms'), 'Europe/Brussels', False)                 |
| -        | 20.8±0.9ms                 | 17.4±0.3ms                           |    0.84 | join_merge.MergeDatetime.time_merge(('ns', 'ms'), 'Europe/Brussels', False)                 |
| -        | 19.6±2ms                   | 16.2±0.1ms                           |    0.83 | join_merge.ConcatIndexDtype.time_concat_series('string[python]', 'monotonic', 1, True)      |
| -        | 3.28±0.3ms                 | 2.71±0.06ms                          |    0.83 | join_merge.Merge.time_merge_dataframe_integer_key(False)                                    |
| -        | 20.1±0.7ms                 | 16.7±0.7ms                           |    0.83 | join_merge.MergeDatetime.time_merge(('ns', 'ns'), None, False)                              |
| -        | 5.87±0.3ms                 | 4.81±0.1ms                           |    0.82 | join_merge.ConcatIndexDtype.time_concat_series('int64[pyarrow]', 'non_monotonic', 1, False) |
| -        | 14.6±0.1ms                 | 11.9±1ms                             |    0.82 | join_merge.MergeEA.time_merge('Float32', False)                                             |
| -        | 14.7±0.4ms                 | 12.0±0.4ms                           |    0.82 | join_merge.MergeEA.time_merge('Int64', False)                                               |
| -        | 22.0±1ms                   | 17.7±0.7ms                           |    0.8  | join_merge.MergeDatetime.time_merge(('ns', 'ms'), None, False)                              |
| -        | 13.9±0.2ms                 | 10.1±0.7ms                           |    0.73 | join_merge.MergeEA.time_merge('UInt16', False)                                              |
| -        | 16.1±0.3ms                 | 11.7±0.08ms                          |    0.73 | join_merge.MergeEA.time_merge('UInt64', False)                                              |
| -        | 13.5±0.05ms                | 9.48±0.09ms                          |    0.7  | join_merge.MergeEA.time_merge('Int32', False)                                               |
| -        | 15.0±0.2ms                 | 10.4±0.2ms                           |    0.7  | join_merge.MergeEA.time_merge('UInt32', False)                                              |
| -        | 16.0±0.3ms                 | 10.7±0.5ms                           |    0.67 | join_merge.MergeDatetime.time_merge(('ns', 'ms'), 'Europe/Brussels', True)                  |
| -        | 14.1±0.4ms                 | 9.46±0.04ms                          |    0.67 | join_merge.MergeEA.time_merge('Int16', False)                                               |
| -        | 13.4±0.3ms                 | 8.67±0.3ms                           |    0.65 | join_merge.MergeEA.time_merge('Float64', True)                                              |
| -        | 13.3±0.2ms                 | 8.70±0.2ms                           |    0.65 | join_merge.MergeEA.time_merge('UInt64', True)                                               |
| -        | 15.6±0.2ms                 | 10.0±0.3ms                           |    0.64 | join_merge.MergeDatetime.time_merge(('ms', 'ms'), None, True)                               |
| -        | 12.6±0.09ms                | 7.98±0.6ms                           |    0.63 | join_merge.MergeEA.time_merge('Int64', True)                                                |
| -        | 15.7±0.4ms                 | 9.72±0.1ms                           |    0.62 | join_merge.MergeDatetime.time_merge(('ms', 'ms'), 'Europe/Brussels', True)                  |
| -        | 17.5±2ms                   | 10.8±0.3ms                           |    0.61 | join_merge.MergeDatetime.time_merge(('ns', 'ms'), None, True)                               |
| -        | 15.3±0.4ms                 | 9.35±0.2ms                           |    0.61 | join_merge.MergeDatetime.time_merge(('ns', 'ns'), None, True)                               |
| -        | 12.7±0.4ms                 | 7.74±0.6ms                           |    0.61 | join_merge.MergeEA.time_merge('UInt32', True)                                               |
| -        | 16.0±0.8ms                 | 9.25±0.1ms                           |    0.58 | join_merge.MergeDatetime.time_merge(('ns', 'ns'), 'Europe/Brussels', True)                  |
| -        | 12.1±0.3ms                 | 7.07±0.5ms                           |    0.58 | join_merge.MergeEA.time_merge('UInt16', True)                                               |
| -        | 13.1±2ms                   | 7.25±0.7ms                           |    0.55 | join_merge.MergeEA.time_merge('Float32', True)                                              |
| -        | 11.7±0.3ms                 | 6.41±0.06ms                          |    0.55 | join_merge.MergeEA.time_merge('Int16', True)                                                |
| -        | 12.0±0.5ms                 | 6.44±0.03ms                          |    0.54 | join_merge.MergeEA.time_merge('Int32', True)                                                |
| -        | 119±5ms                    | 52.4±0.1ms                           |    0.44 | join_merge.MergeOrdered.time_merge_ordered  

@lukemanley lukemanley added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 16, 2023
@lukemanley lukemanley added this to the 2.2 milestone Dec 16, 2023
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRs looks good, it looks like this broke some code examples in the merge user guide though

@phofl phofl merged commit 061c2e9 into pandas-dev:main Dec 17, 2023
@phofl
Copy link
Member

phofl commented Dec 17, 2023

thx!

I am a very big fan of this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: merge_ordered(how="inner") without prior filtering is slow and uses much more memory.
2 participants