Skip to content

pd.merge_asof() matches out of tolerance when timestamps are duplicated #13709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chrisaycock opened this issue Jul 19, 2016 · 3 comments · Fixed by #13836
Closed

pd.merge_asof() matches out of tolerance when timestamps are duplicated #13709

chrisaycock opened this issue Jul 19, 2016 · 3 comments · Fixed by #13836
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@chrisaycock
Copy link
Contributor

This is a continuation of #13695.

Starting with the original DataFrames from that issue:

df1 = pd.DataFrame({'time':pd.to_datetime(['2016-07-15 13:30:00.030']), 'username':['bob']})
df2 = pd.DataFrame({'time':pd.to_datetime(['2016-07-15 13:30:00.000', '2016-07-15 13:30:00.030']), 'version':[1, 2]})

I now get the null:

In [82]: pd.merge_asof(df1, df2, on='time', allow_exact_matches=False, tolerance=pd.Timedelta('10ms'))
Out[82]:
                     time username  version
0 2016-07-15 13:30:00.030      bob      NaN

However, if I change the first DataFrame to have duplicate timestamps:

df1 = pd.DataFrame({'time':pd.to_datetime(['2016-07-15 13:30:00.030', '2016-07-15 13:30:00.030']), 'username':['bob', 'charlie']})

then the bug reappears:

In [85]: pd.merge_asof(df1, df2, on='time', allow_exact_matches=False, tolerance=pd.Timedelta('10ms'))
Out[85]:
                     time username  version
0 2016-07-15 13:30:00.030      bob        1
1 2016-07-15 13:30:00.030  charlie        1

This is in pandas version 0.18.0+418.gc46dcfa.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jul 19, 2016
@jreback jreback added this to the 0.19.0 milestone Jul 19, 2016
@chrisaycock
Copy link
Contributor Author

I can take a stab at this. I should dive into the asof internals anyway.

@jreback
Copy link
Contributor

jreback commented Jul 21, 2016

great @chrisaycock yeah the fix is trivial, BUT it breaks the full on test, not entirely sure why, so its not exactly right

@chrisaycock
Copy link
Contributor Author

Alright, I've issued a pull request:

#13836

I just rewrote the Cython logic to compare the factorized keys directly since that was the easiest way forward. Though we don't actually have to factorize the keys at all; we could just compare the timestamps directly, which would be even faster.

jreback pushed a commit that referenced this issue Aug 1, 2016
) (#13836)

Also removes unnecessary check_duplicates.

Added asv benchmarks for merge_asof()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants