Skip to content

Disabling sorting in glob and scandir functions for better performance #16052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

lfluvisotto
Copy link
Contributor

Disabling sorting in glob and scandir functions for better performance

Description

Disabling sorting in glob and scandir functions for better performance

https://www.exakat.io/php-likes-sorting/

Copy link
Contributor

@orlangur orlangur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @lfluvisotto!

No benchmarking needed, in article it was something like 25% but even 5% would be worth such improvement.

Didn't grep all over the code for the glob and scandir occurrences, I assume this was already done by PR author.

@magento-engcom-team
Copy link
Contributor

Hi @orlangur, thank you for the review.
ENGCOM-1972 has been created to process this Pull Request

@magento-engcom-team
Copy link
Contributor

Hi @lfluvisotto. Thank you for your contribution.
Changes from your Pull Request will be available with the upcoming 2.2.6 release.

Please, consider to port this solution to 2.3 release line.
You may use Porting tool to port commits automatically.

@gamort
Copy link

gamort commented Oct 3, 2018

This was a really really REALLY bad idea.
Specifically for
lib/internal/Magento/Framework/App/Utility/Files.php
lib/internal/Magento/Framework/Data/Collection/Filesystem.php
and
lib/internal/Magento/Framework/Filesystem/Io/File.php

The rest of the changes in this PR were fine - they are made to functions were there was no expectation of sorting. I doubt it saved more than 100ms for unit testing and deployment, but that is ok. The problem is forcing a non-standard default behaviour in general purpose classes. And it is forced - there is no way to actually turn sorting back on AND use the file access classes.

The benchmark from https://www.exakat.io/php-likes-sorting/ was irrelevant. That benchmark was for sorting 26,000 files that matched the pattern to begin with. Your not sorting thousands of files, at most hundreds, more likely dozens.

Example: A magento installation with 100,000 files, 260 of them are xml files
A glob search will first locate 260 xml files. It will THEN sort 260 xml files. So you are not going from 14s to 12s, more like 140ms to 120ms. Saving 20ms was a useless waste of time.

The default behaviour of glob searches is to sort. Everyone expects them to sort. Short of using the glob function directly[and therefore losing all the benefits of a standardized file system set of classes] there is no way to sort other then to sort the array of files which are returned. Which will add 200ms or more since array sorting in PHP is less effiicient! [See the same article above].

Now, WHY does everyone want sorting to take place by default?

Simple, whenever we operate on a list of files, we don' have to provide any advanced "work on file X before working on file Y" programming logic. We just name our files appropriately. Look in your php/conf.d folder and you will immediately see it:
10-somext.ini
20-someext.ini
30-somext.ini
zz-magento.ini
The magento specific PHP configuration settings are always applied last.

Why do we do this in a very low level system function instead of higher level code? Because it is stupid to optimize hundreds of thousands of applications when you can optimize a single system function. And hey, YOU don't have to optimize it, thousands of programmers have spent decades optimizing it for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment