Description
When importing customers into a site with a large number of existing customers through the standard Magento importer tool (System > Data Transfer > Import), with Entity Type: Customers Main File, and Import Behavior: Add/Update Complex Data, the import POST returns a 500 status caused by exceeding allowed memory limits.
Preconditions
- Magento Enterprise 2.1.4
- Server is running PHP version 7.0.15
- Memory limit is set to the standard 768 MB
- The target store already has a large number of customers (eg 250k+, we experience it at 261,659)
Steps to reproduce
- In admin, go to SYSTEM > Import
- Set Entity Type to: Customers Main File
- Set Import Behavior to: Add/Update Complex Data
- Select a valid import CSV with at least 1 customer row, and select Check Data
- After data validates, select Import
Expected result
- Customer data should be imported
Actual result
- Import Screen loader never expires & Import POST responds with status code 500:
- Server error log shows a corresponding PHP Fatal error:
PHP Fatal error: Allowed memory size of 805306368 bytes exhausted (tried to allocate 20480 bytes) in /server/sites/(SITE)/vendor/magento/framework/Model/ResourceModel/Db/VersionControl/Snapshot.php on line 47, referer: https://(SITE)/backend/admin/import/
ImportError.txt
The memory use appears to be mainly triggered by framework level code (\Magento\Framework\Model\ResourceModel\Db\VersionControl\Snapshot::registerSnapshot
), and is likely platform independent.
Through profiling and debugging we have found 2 very expensive processes that seem to be contributing to this. When beginning the customer import, \Magento\CustomerImportExport\Model\ResourceModel\Import\Customer\Storage::getCustomerId
is called, which triggers \Magento\CustomerImportExport\Model\ResourceModel\Import\Customer\Storage::load
to execute which in turn executes \Magento\ImportExport\Model\ResourceModel\CollectionByPagesIterator::iterate
. Through this process every existing customer account is retrieved and loaded, which is a costly procedure to begin with, but then the email, website id and customer ids are loaded into a multidimensional array. Additionally, loading the customer array as part of the CollectByPagesIterator::iterate method, triggers snapshotting every customer and loading the whole entity data into another in memory array.
Every existing customer is loaded and initialized as a full customer entity object, and data is held in multiple in-memory locations, in order to import a relatively small collection of new customers. This is an unsustainable practice, is costly in time, processing, and memory, and it is difficult to see the value in doing this.