Skip to content

PERF: sort_values speedup for multiple columns with random numeric values #19237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

mficek
Copy link

@mficek mficek commented Jan 14, 2018

Continue work on #17141.
Speedup achieved for sorting by multiple columns: lexsort over multiple columns
replaced with mergesort of columns in reverse order than specified in "by"
argument of DataFrame.sort_values().
Issues:

  • only works for numerical data

  • only random enough data achieve sorting speedup (no factorization)

  • ASV of one-column sort is worse, why?

  • closes #xxxx

  • tests added / passed

  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

  • whatsnew entry

@pep8speaks
Copy link

pep8speaks commented Jan 14, 2018

Hello @mficek! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 14, 2018 at 08:37 Hours UTC

@mficek
Copy link
Author

mficek commented Jan 14, 2018

I'm running tests and found out, that many of them fail simply because expected method for multi column sort is not a mergesort (stable) which I use. Hm... Should I start fixing tests? Doesn't sound right to me, but the approach I use is imho correct.

@jreback jreback added the Performance Memory or execution speed performance label Jan 14, 2018
@jreback
Copy link
Contributor

jreback commented Jan 14, 2018

I'm running tests and found out, that many of them fail simply because expected method for multi column sort is not a mergesort (stable) which I use. Hm... Should I start fixing tests? Doesn't sound right to me, but the approach I use is imho correct.

can you show what you mean here?

prob would just change the default (this behavior is inherited from ndarray which defaults to quicksort :<)

@desasaliho
Copy link

Hello..i have a problem in my Facebook when I login it say your account has been disabled
By an administrator I want to know why I am Disabled please and when I can login back I
Hope that you can help me and it thank you for
Any answer you will give it to me bye

@jreback
Copy link
Contributor

jreback commented Feb 24, 2018

can you rebase and fixup

@mficek
Copy link
Author

mficek commented Mar 25, 2018

Hi Jeff, I'm sorry, but I can not find enough time and motivation to finish this issue. It seems more complex than expected with improvements being too much dependent of the nature of the data.

@mficek mficek closed this Mar 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants