-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Not all Pandas dataframes are shared in a multiprocessing list #20792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
there is not enough detail here to even guess at what is wrong. not likely a pandas problem, rather a usage of multiprocessing. |
@jreback did you see the full reproducible example on StackOverflow? |
I was able to reproduce this, but I also had the same issue when sharing lists or ints (not only pandas data frames). My setup: MacOs High Sierra |
@KhaledTo How did you create lists or ints to reproduce this? It bothers me I couldn't reproduce this with lists or ints myself now... Thanks for trying to reproduce it! |
Hi @freezas, yes it's better if you check if what I did makes sens. I added this to my_function.py: def share_random_pandas_dataframe(shared_list):
list_int = [1, 2, 3]
shared_list.append(list_int) In multiprocessing_example.py I then set processes_count to 19: processes_count = 19 My pleasure. |
@KhaledTo Weird, that still doesn't seem to raise the same problem on my computer. Maybe it's different for different operating systems..? If it also happens for other data structures/types, it's probably a multiprocessing issue instead of a pandas issue. I'll close this issue. Thank you all! |
Uh oh!
There was an error while loading. Please reload this page.
Hello,
I've tried to get answer for this question on StackOverflow first, but I hope some of you can explain this and hopefully lead us to a solution.
The StackOverflow question is here: https://stackoverflow.com/questions/49942878/not-all-pandas-dataframes-are-shared-in-a-multiprocessing-list
I've also added an error callback and managed to get an error:
RemoteError('Traceback (most recent call last):
File "lib\multiprocessing\managers.py", line 228, in serve_client
request = recv()
File "lib\multiprocessing\connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
AttributeError: Can't get attribute 'DataFrame' on <module 'pandas.core.frame' from 'lib\site-packages\pandas\core\frame.py>'
I've looked into the GitHub tracker and I found this issue that looks a lot like mine: #2440 Although there are a few differences:
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.2 (I've also tested this with pandas version 0.22.0, which I believe was the latests)
nose: 1.3.7
pip: 10.0.0
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.13.1
scipy: 1.0.1
statsmodels: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
matplotlib: 2.1.1
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
httplib2: None
apiclient: None
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
boto: 2.48.0
pandas_datareader: None
If you need anything else, let me know. We appreciate all the work you've done!
The text was updated successfully, but these errors were encountered: