Skip to content

Not all Pandas dataframes are shared in a multiprocessing list #20792

Closed
@freezas

Description

@freezas

Hello,

I've tried to get answer for this question on StackOverflow first, but I hope some of you can explain this and hopefully lead us to a solution.

The StackOverflow question is here: https://stackoverflow.com/questions/49942878/not-all-pandas-dataframes-are-shared-in-a-multiprocessing-list

I've also added an error callback and managed to get an error:

RemoteError('Traceback (most recent call last):
File "lib\multiprocessing\managers.py", line 228, in serve_client
request = recv()
File "lib\multiprocessing\connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
AttributeError: Can't get attribute 'DataFrame' on <module 'pandas.core.frame' from 'lib\site-packages\pandas\core\frame.py>'

I've looked into the GitHub tracker and I found this issue that looks a lot like mine: #2440 Although there are a few differences:

  • I'm using multiprocessing instead of threading. Because of this, we can use a multiprocessing.Pool and and a special list object to share objects.
  • In our example, we don't actually change the dataframe in the different processes. We're only adding it to the list of shared objects.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2 (I've also tested this with pandas version 0.22.0, which I believe was the latests)
nose: 1.3.7
pip: 10.0.0
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.13.1
scipy: 1.0.1
statsmodels: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
matplotlib: 2.1.1
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
httplib2: None
apiclient: None
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
boto: 2.48.0
pandas_datareader: None

If you need anything else, let me know. We appreciate all the work you've done!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions