Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

RuntimeError: unable to open shared memory object </torch_2276740_2849291446> in read-write mode #158

Closed
JackKelly opened this issue Sep 24, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@JackKelly
Copy link
Member

JackKelly commented Sep 24, 2021

Describe the bug
prepare_ML_data.py just crashed with this error:

021-09-24 07:07:21,672 INFO ./prepare_ml_data.py 178 Got batch 7380
2021-09-24 07:07:44,543 INFO ./prepare_ml_data.py 178 Got batch 7381
2021-09-24 07:08:10,839 INFO ./prepare_ml_data.py 178 Got batch 7382
Traceback (most recent call last):
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 319, in reduce_storage
    metadata = storage._share_filename_()
RuntimeError: unable to open shared memory object </torch_2276740_2849291446> in read-write mode

Interestingly, I had to hit ctrl-c to actually exit the script. Doing so threw this error, which I think tells us where in our code the crash occured:

^CTraceback (most recent call last):
  File "./prepare_ml_data.py", line 218, in <module>
    main()
  File "./prepare_ml_data.py", line 207, in main
    iterate_over_dataloader_and_write_to_disk(datamodule.train_dataloader(), DST_TRAIN_PATH)
  File "./prepare_ml_data.py", line 177, in iterate_over_dataloader_and_write_to_disk
    for batch_i, batch in enumerate(dataloader):
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1186, in _next_data
    idx, data = self._get_data()
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1152, in _get_data
    success, data = self._try_get_data()
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 990, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/multiprocessing/queues.py", line 107, in get
    if not self._poll(timeout):
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
    r = wait([self], timeout)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
[1]+  Done                    emacs prepare_ml_data.py

Versions:

# packages in environment at /home/jack/miniconda3/envs/nowcasting_dataset:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                      1_llvm    conda-forge
absl-py                   0.13.0             pyhd8ed1ab_0    conda-forge
aiohttp                   3.7.4.post0      py38h497a2fe_0    conda-forge
alsa-lib                  1.2.3                h516909a_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
asciitree                 0.3.3                      py_2    conda-forge
async-timeout             3.0.1                   py_1000    conda-forge
attrs                     21.2.0             pyhd8ed1ab_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
black                     21.9b0             pyhd8ed1ab_0    conda-forge
blas                      2.111                       mkl    conda-forge
blas-devel                3.9.0            11_linux64_mkl    conda-forge
blinker                   1.4                        py_1    conda-forge
blosc                     1.21.0               h9c3ff4c_0    conda-forge
boto3                     1.18.14                  pypi_0    pypi
botocore                  1.21.41                  pypi_0    pypi
bravado                   11.0.3                   pypi_0    pypi
bravado-core              5.17.0                   pypi_0    pypi
brotli                    1.0.9                h7f98852_5    conda-forge
brotli-bin                1.0.9                h7f98852_5    conda-forge
brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
brunsli                   0.1                  h9c3ff4c_0    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.2               h7f98852_0    conda-forge
ca-certificates           2021.5.30            ha878542_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                4.2.2              pyhd8ed1ab_0    conda-forge
cairo                     1.16.0            h6cf1ce9_1008    conda-forge
certifi                   2021.5.30        py38h578d9bd_0    conda-forge
cffi                      1.14.6           py38h3931269_1    conda-forge
cfgv                      3.3.1              pyhd8ed1ab_0    conda-forge
cfitsio                   3.470                hb418390_7    conda-forge
cftime                    1.5.0            py38hb5d20a5_0    conda-forge
chardet                   4.0.0            py38h578d9bd_1    conda-forge
charls                    2.2.0                h9c3ff4c_0    conda-forge
charset-normalizer        2.0.0              pyhd8ed1ab_0    conda-forge
click                     8.0.1            py38h578d9bd_0    conda-forge
click-plugins             1.1.1                    pypi_0    pypi
cligj                     0.7.2                    pypi_0    pypi
cloudpickle               2.0.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
coverage                  5.5                      pypi_0    pypi
cryptography              3.4.7            py38ha5dfef3_0    conda-forge
cudatoolkit               11.1.1               h6406543_8    conda-forge
curl                      7.79.0               hea6ffbf_0    conda-forge
cycler                    0.10.0                     py_2    conda-forge
cytoolz                   0.11.0           py38h497a2fe_3    conda-forge
dask-core                 2021.9.1           pyhd8ed1ab_0    conda-forge
dataclasses               0.8                pyhc8e2a94_3    conda-forge
dbus                      1.13.6               h48d8840_2    conda-forge
debugpy                   1.4.1            py38h709712a_0    conda-forge
decorator                 4.4.2                      py_0    conda-forge
distlib                   0.3.3              pyhd8ed1ab_0    conda-forge
editdistance-s            1.0.0            py38h1fd1430_1    conda-forge
entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
ephem                     4.0.0.2          py38h497a2fe_0    conda-forge
expat                     2.4.1                h9c3ff4c_0    conda-forge
fasteners                 0.16               pyhd8ed1ab_0    conda-forge
ffmpeg                    4.3                  hf484d3e_0    pytorch
filelock                  3.0.12             pyh9f0ad1d_0    conda-forge
fiona                     1.8.20                   pypi_0    pypi
flake8                    3.9.2              pyhd8ed1ab_0    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
fsspec                    2021.8.1           pyhd8ed1ab_0    conda-forge
future                    0.18.2           py38h578d9bd_3    conda-forge
gcsfs                     2021.8.1           pyhd8ed1ab_0    conda-forge
geopandas                 0.9.0                    pypi_0    pypi
gettext                   0.19.8.1          h73d1719_1006    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
gitdb                     4.0.7                    pypi_0    pypi
gitpython                 3.1.18                   pypi_0    pypi
glib                      2.68.4               h9c3ff4c_1    conda-forge
glib-tools                2.68.4               h9c3ff4c_1    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
gnutls                    3.6.13               h85f3911_1    conda-forge
google-api-core           2.0.1                    pypi_0    pypi
google-auth               1.35.0             pyh6c4a22f_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-cloud-core         2.0.0                    pypi_0    pypi
google-cloud-storage      1.42.2                   pypi_0    pypi
google-crc32c             1.2.0                    pypi_0    pypi
google-resumable-media    2.0.3                    pypi_0    pypi
googleapis-common-protos  1.53.0                   pypi_0    pypi
graphite2                 1.3.13            h58526e2_1001    conda-forge
grpcio                    1.38.1           py38hdd6454d_0    conda-forge
gst-plugins-base          1.18.5               hf529b03_0    conda-forge
gstreamer                 1.18.5               h76c114f_0    conda-forge
h5netcdf                  0.11.0             pyhd8ed1ab_0    conda-forge
h5py                      3.3.0           nompi_py38h9915d05_100    conda-forge
harfbuzz                  2.9.1                h83ec7ef_0    conda-forge
hdf4                      4.2.15               h10796ff_3    conda-forge
hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
icu                       68.1                 h58526e2_0    conda-forge
identify                  2.2.15             pyhd8ed1ab_0    conda-forge
idna                      3.1                pyhd3deb0d_0    conda-forge
imagecodecs               2021.7.30        py38hb5ce8f7_0    conda-forge
imageio                   2.9.0                      py_0    conda-forge
importlib-metadata        4.8.1            py38h578d9bd_0    conda-forge
iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
ipykernel                 6.4.1            py38he5a9106_0    conda-forge
ipython                   7.27.0           py38he5a9106_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
jasper                    1.900.1           h07fcdf6_1006    conda-forge
jedi                      0.18.0           py38h578d9bd_2    conda-forge
jinja2                    3.0.1                    pypi_0    pypi
jmespath                  0.10.0                   pypi_0    pypi
jpeg                      9d                   h36c2ea0_0    conda-forge
jsonpointer               2.1                      pypi_0    pypi
jsonref                   0.2                      pypi_0    pypi
jsonschema                3.2.0                    pypi_0    pypi
jupyter_client            7.0.3              pyhd8ed1ab_0    conda-forge
jupyter_core              4.8.1            py38h578d9bd_0    conda-forge
jxrlib                    1.1                  h7f98852_2    conda-forge
kiwisolver                1.3.2            py38h1fd1430_0    conda-forge
krb5                      1.19.2               hcc1bbae_0    conda-forge
lame                      3.100             h7f98852_1001    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      2.2.1                h9c3ff4c_0    conda-forge
libaec                    1.0.6                h9c3ff4c_0    conda-forge
libblas                   3.9.0            11_linux64_mkl    conda-forge
libbrotlicommon           1.0.9                h7f98852_5    conda-forge
libbrotlidec              1.0.9                h7f98852_5    conda-forge
libbrotlienc              1.0.9                h7f98852_5    conda-forge
libcblas                  3.9.0            11_linux64_mkl    conda-forge
libclang                  11.1.0          default_ha53f305_1    conda-forge
libcurl                   7.79.0               h2574ce0_0    conda-forge
libdeflate                1.8                  h7f98852_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               hcdb4288_3    conda-forge
libffi                    3.4.2                h9c3ff4c_2    conda-forge
libgcc-ng                 11.2.0               h1d223b6_8    conda-forge
libgfortran-ng            11.2.0               h69a702a_8    conda-forge
libgfortran5              11.2.0               h5c6108e_8    conda-forge
libglib                   2.68.4               h174f98d_1    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0            11_linux64_mkl    conda-forge
liblapacke                3.9.0            11_linux64_mkl    conda-forge
libllvm10                 10.0.1               he513fc3_3    conda-forge
libllvm11                 11.1.0               hf817b99_2    conda-forge
libnetcdf                 4.8.1           nompi_hcd642e3_100    conda-forge
libnghttp2                1.43.0               h812cca2_0    conda-forge
libogg                    1.3.4                h7f98852_1    conda-forge
libopencv                 4.4.0                    py38_2    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libpq                     13.3                 hd57d9b9_0    conda-forge
libprotobuf               3.18.0               h780b84a_0    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libssh2                   1.10.0               ha56f1ee_0    conda-forge
libstdcxx-ng              11.2.0               he4da1e4_8    conda-forge
libtiff                   4.3.0                hf544144_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libuv                     1.42.0               h7f98852_0    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.2.1                h7f98852_0    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.9.12               h72842e0_0    conda-forge
libzip                    1.8.0                h4de3113_0    conda-forge
libzopfli                 1.0.3                h9c3ff4c_0    conda-forge
llvm-openmp               12.0.1               h4bd325d_1    conda-forge
llvmlite                  0.34.0           py38h4f45e52_2    conda-forge
locket                    0.2.0                      py_2    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
markdown                  3.3.4              pyhd8ed1ab_0    conda-forge
markupsafe                2.0.1                    pypi_0    pypi
matplotlib                3.4.3            py38h578d9bd_0    conda-forge
matplotlib-base           3.4.3            py38hf4fb855_0    conda-forge
matplotlib-inline         0.1.3              pyhd8ed1ab_0    conda-forge
mccabe                    0.6.1                      py_1    conda-forge
mkl                       2021.3.0           h726a3e6_557    conda-forge
mkl-devel                 2021.3.0           ha770c72_558    conda-forge
mkl-include               2021.3.0           h726a3e6_557    conda-forge
monotonic                 1.5                        py_0    conda-forge
more-itertools            8.10.0             pyhd8ed1ab_0    conda-forge
moto                      2.2.7                    pypi_0    pypi
msgpack-python            1.0.2            py38h1fd1430_1    conda-forge
multidict                 5.1.0            py38h497a2fe_1    conda-forge
munch                     2.5.0                    pypi_0    pypi
mypy                      0.910            py38h497a2fe_0    conda-forge
mypy_extensions           0.4.3            py38h578d9bd_3    conda-forge
mysql-common              8.0.25               ha770c72_2    conda-forge
mysql-libs                8.0.25               hfa10184_2    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
neptune-client            0.10.10                  pypi_0    pypi
neptune-pytorch-lightning 0.9.7                    pypi_0    pypi
nest-asyncio              1.5.1              pyhd8ed1ab_0    conda-forge
netcdf4                   1.5.7           nompi_py38hcc16cfe_101    conda-forge
nettle                    3.6                  he412f7d_0    conda-forge
networkx                  2.6.3              pyhd8ed1ab_0    conda-forge
ninja                     1.10.2               h4bd325d_0    conda-forge
nodeenv                   1.6.0              pyhd8ed1ab_0    conda-forge
nowcasting-dataset        0.1.5                     dev_0    <develop>
nspr                      4.30                 h9c3ff4c_0    conda-forge
nss                       3.69                 hb5efdd6_0    conda-forge
numba                     0.51.2           py38hc5bc63f_0    conda-forge
numcodecs                 0.9.1            py38h709712a_0    conda-forge
numexpr                   2.7.3                    pypi_0    pypi
numpy                     1.21.2           py38he2449b9_0    conda-forge
oauthlib                  3.1.1              pyhd8ed1ab_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
opencv                    4.4.0                    py38_2    conda-forge
openh264                  2.1.1                h780b84a_0    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   1.1.1l               h7f98852_0    conda-forge
packaging                 21.0               pyhd8ed1ab_0    conda-forge
pandas                    1.3.3            py38h43a58ef_0    conda-forge
parso                     0.8.2              pyhd8ed1ab_0    conda-forge
partd                     1.2.0              pyhd8ed1ab_0    conda-forge
pathspec                  0.9.0              pyhd8ed1ab_0    conda-forge
pathy                     0.6.0              pyhd8ed1ab_0    conda-forge
patsy                     0.5.1                      py_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    8.3.2            py38h8e6f84c_0    conda-forge
pip                       21.2.4             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
platformdirs              2.3.0              pyhd8ed1ab_0    conda-forge
plotly                    5.3.1                    pypi_0    pypi
pluggy                    1.0.0            py38h578d9bd_1    conda-forge
pooch                     1.5.1              pyhd8ed1ab_0    conda-forge
pre-commit                2.15.0           py38h578d9bd_0    conda-forge
proj                      8.1.0                h277dcde_1    conda-forge
prompt-toolkit            3.0.20             pyha770c72_0    conda-forge
protobuf                  3.18.0           py38h709712a_0    conda-forge
psutil                    5.8.0            py38h497a2fe_1    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pvlib                     0.9.0                      py_1    pvlib
py                        1.10.0             pyhd3deb0d_0    conda-forge
py-opencv                 4.4.0            py38h23f93f0_2    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycodestyle               2.7.0              pyhd8ed1ab_0    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pydantic                  1.8.2                    pypi_0    pypi
pydeprecate               0.3.1              pyhd8ed1ab_0    conda-forge
pyflakes                  2.3.1              pyhd8ed1ab_0    conda-forge
pygments                  2.10.0             pyhd8ed1ab_0    conda-forge
pyjwt                     2.1.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyproj                    3.2.1            py38h3701b11_0    conda-forge
pyqt                      5.12.3           py38h578d9bd_7    conda-forge
pyqt-impl                 5.12.3           py38h7400c14_7    conda-forge
pyqt5-sip                 4.19.18          py38h709712a_7    conda-forge
pyqtchart                 5.12             py38h7400c14_7    conda-forge
pyqtwebengine             5.12.1           py38h7400c14_7    conda-forge
pyrsistent                0.18.0                   pypi_0    pypi
pysocks                   1.7.1            py38h578d9bd_3    conda-forge
pytest                    6.2.5            py38h578d9bd_0    conda-forge
pytest-cov                2.12.1                   pypi_0    pypi
python                    3.8.12          hb7a2778_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytorch                   1.9.1           py3.8_cuda11.1_cudnn8.0.5_0    pytorch
pytorch-lightning         1.4.6              pyhd8ed1ab_0    conda-forge
pytz                      2021.1             pyhd8ed1ab_0    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
pywavelets                1.1.1            py38h6c62de6_3    conda-forge
pyyaml                    5.4.1            py38h497a2fe_1    conda-forge
pyzmq                     22.3.0           py38h2035c66_0    conda-forge
qt                        5.12.9               hda022c4_4    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
regex                     2021.8.28        py38h497a2fe_0    conda-forge
requests                  2.26.0             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.0              pyh9f0ad1d_0    conda-forge
responses                 0.14.0                   pypi_0    pypi
rfc3987                   1.3.8                    pypi_0    pypi
rsa                       4.7.2              pyh44b312d_0    conda-forge
s3transfer                0.5.0                    pypi_0    pypi
scikit-image              0.18.3           py38h43a58ef_0    conda-forge
scipy                     1.7.1            py38h56a6a73_0    conda-forge
setuptools                58.0.4           py38h578d9bd_1    conda-forge
shapely                   1.7.1                    pypi_0    pypi
shellingham               1.4.0              pyh44b312d_0    conda-forge
simplejson                3.17.5                   pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
smart_open                5.2.1              pyhd8ed1ab_0    conda-forge
smmap                     4.0.0                    pypi_0    pypi
snappy                    1.1.8                he1b5a44_3    conda-forge
sqlite                    3.36.0               h9cd32fc_1    conda-forge
statsmodels               0.12.2           py38h6c62de6_0    conda-forge
strict-rfc3339            0.7                      pypi_0    pypi
swagger-spec-validator    2.7.3                    pypi_0    pypi
tables                    3.6.1                    pypi_0    pypi
tbb                       2021.3.0             h4bd325d_0    conda-forge
tenacity                  8.0.1                    pypi_0    pypi
tensorboard               2.6.0              pyhd8ed1ab_1    conda-forge
tensorboard-data-server   0.6.0            py38h3e25421_0    conda-forge
tensorboard-plugin-wit    1.8.0              pyh44b312d_0    conda-forge
tifffile                  2021.8.30          pyhd8ed1ab_0    conda-forge
tilemapbase               0.4.7                    pypi_0    pypi
tk                        8.6.11               h27826a3_1    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     1.2.1              pyhd8ed1ab_0    conda-forge
toolz                     0.11.1                     py_0    conda-forge
torchmetrics              0.5.1              pyhd8ed1ab_0    conda-forge
tornado                   6.1              py38h497a2fe_1    conda-forge
tqdm                      4.62.3             pyhd8ed1ab_0    conda-forge
traitlets                 5.1.0              pyhd8ed1ab_0    conda-forge
typed-ast                 1.4.3            py38h497a2fe_0    conda-forge
typer                     0.4.0              pyhd8ed1ab_0    conda-forge
typing-extensions         3.10.0.0             hd8ed1ab_0    conda-forge
typing_extensions         3.10.0.0           pyha770c72_0    conda-forge
urllib3                   1.26.6             pyhd8ed1ab_0    conda-forge
virtualenv                20.4.7           py38h578d9bd_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webcolors                 1.11.1                   pypi_0    pypi
websocket-client          1.2.1                    pypi_0    pypi
werkzeug                  2.0.1              pyhd8ed1ab_0    conda-forge
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
xarray                    0.19.0             pyhd8ed1ab_1    conda-forge
xmltodict                 0.12.0                   pypi_0    pypi
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
yarl                      1.6.3            py38h497a2fe_2    conda-forge
zarr                      2.10.0             pyhd8ed1ab_0    conda-forge
zeromq                    4.3.4                h9c3ff4c_1    conda-forge
zfp                       0.5.5                h9c3ff4c_6    conda-forge
zipp                      3.5.0              pyhd8ed1ab_0    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge
zstd                      1.5.0                ha95c52a_0    conda-forge

To Reproduce
I fear this might be hard to reliably reproduce :)

@JackKelly JackKelly added the bug Something isn't working label Sep 24, 2021
@JackKelly
Copy link
Member Author

The same thing happened again although, interestingly, this time the script created a bunch more batches before finally dying...

This feels like an upstream bug in PyTorch. Maybe the ultimate solution for us is #86

2021-09-25 03:32:29,079 INFO ./prepare_ml_data.py 178 Got batch 12598
2021-09-25 03:32:31,618 INFO ./prepare_ml_data.py 178 Got batch 12599
2021-09-25 03:32:37,760 INFO ./prepare_ml_data.py 178 Got batch 12600
Traceback (most recent call last):
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 319, in reduce_storage
    metadata = storage._share_filename_()
RuntimeError: unable to open shared memory object </torch_2197781_1082808688> in read-write mode
2021-09-25 03:34:25,953 INFO ./prepare_ml_data.py 178 Got batch 12601
2021-09-25 03:34:31,637 INFO ./prepare_ml_data.py 178 Got batch 12602
2021-09-25 03:34:35,005 INFO ./prepare_ml_data.py 178 Got batch 12603
2021-09-25 03:34:38,212 INFO ./prepare_ml_data.py 178 Got batch 12604
2021-09-25 03:34:41,028 INFO ./prepare_ml_data.py 178 Got batch 12605
2021-09-25 03:34:44,046 INFO ./prepare_ml_data.py 178 Got batch 12606
2021-09-25 03:34:47,447 INFO ./prepare_ml_data.py 178 Got batch 12607
2021-09-25 03:34:51,561 INFO ./prepare_ml_data.py 178 Got batch 12608
2021-09-25 03:36:29,463 INFO ./prepare_ml_data.py 178 Got batch 12609
2021-09-25 03:36:33,608 INFO ./prepare_ml_data.py 178 Got batch 12610
2021-09-25 03:36:36,848 INFO ./prepare_ml_data.py 178 Got batch 12611
2021-09-25 03:36:39,509 INFO ./prepare_ml_data.py 178 Got batch 12612
2021-09-25 03:36:43,427 INFO ./prepare_ml_data.py 178 Got batch 12613
2021-09-25 03:36:48,545 INFO ./prepare_ml_data.py 178 Got batch 12614
2021-09-25 03:36:51,057 INFO ./prepare_ml_data.py 178 Got batch 12615
2021-09-25 03:36:54,518 INFO ./prepare_ml_data.py 178 Got batch 12616
2021-09-25 03:38:26,891 INFO ./prepare_ml_data.py 178 Got batch 12617
2021-09-25 03:38:31,718 INFO ./prepare_ml_data.py 178 Got batch 12618
2021-09-25 03:38:35,624 INFO ./prepare_ml_data.py 178 Got batch 12619
2021-09-25 03:38:38,600 INFO ./prepare_ml_data.py 178 Got batch 12620
2021-09-25 03:38:41,711 INFO ./prepare_ml_data.py 178 Got batch 12621
2021-09-25 03:38:44,807 INFO ./prepare_ml_data.py 178 Got batch 12622
2021-09-25 03:38:47,862 INFO ./prepare_ml_data.py 178 Got batch 12623
2021-09-25 03:38:51,357 INFO ./prepare_ml_data.py 178 Got batch 12624
2021-09-25 03:40:25,210 INFO ./prepare_ml_data.py 178 Got batch 12625
2021-09-25 03:40:30,262 INFO ./prepare_ml_data.py 178 Got batch 12626
2021-09-25 03:40:34,261 INFO ./prepare_ml_data.py 178 Got batch 12627
2021-09-25 03:40:37,269 INFO ./prepare_ml_data.py 178 Got batch 12628
2021-09-25 03:40:41,055 INFO ./prepare_ml_data.py 178 Got batch 12629
2021-09-25 03:40:44,168 INFO ./prepare_ml_data.py 178 Got batch 12630
2021-09-25 03:40:46,994 INFO ./prepare_ml_data.py 178 Got batch 12631
2021-09-25 03:40:50,361 INFO ./prepare_ml_data.py 178 Got batch 12632
2021-09-25 03:42:30,618 INFO ./prepare_ml_data.py 178 Got batch 12633
2021-09-25 03:42:35,057 INFO ./prepare_ml_data.py 178 Got batch 12634
2021-09-25 03:42:39,692 INFO ./prepare_ml_data.py 178 Got batch 12635
2021-09-25 03:42:42,661 INFO ./prepare_ml_data.py 178 Got batch 12636
2021-09-25 03:42:45,789 INFO ./prepare_ml_data.py 178 Got batch 12637
2021-09-25 03:42:50,234 INFO ./prepare_ml_data.py 178 Got batch 12638
2021-09-25 03:42:54,329 INFO ./prepare_ml_data.py 178 Got batch 12639
2021-09-25 03:42:56,958 INFO ./prepare_ml_data.py 178 Got batch 12640
2021-09-25 03:44:37,651 INFO ./prepare_ml_data.py 178 Got batch 12641
2021-09-25 03:44:42,296 INFO ./prepare_ml_data.py 178 Got batch 12642
2021-09-25 03:44:45,379 INFO ./prepare_ml_data.py 178 Got batch 12643
2021-09-25 03:44:48,372 INFO ./prepare_ml_data.py 178 Got batch 12644
2021-09-25 03:44:51,125 INFO ./prepare_ml_data.py 178 Got batch 12645
2021-09-25 03:44:54,308 INFO ./prepare_ml_data.py 178 Got batch 12646
2021-09-25 03:44:56,894 INFO ./prepare_ml_data.py 178 Got batch 12647
2021-09-25 03:45:00,264 INFO ./prepare_ml_data.py 178 Got batch 12648
2021-09-25 03:46:40,158 INFO ./prepare_ml_data.py 178 Got batch 12649
2021-09-25 03:46:44,672 INFO ./prepare_ml_data.py 178 Got batch 12650
2021-09-25 03:46:47,625 INFO ./prepare_ml_data.py 178 Got batch 12651
2021-09-25 03:46:50,322 INFO ./prepare_ml_data.py 178 Got batch 12652

@JackKelly
Copy link
Member Author

JackKelly commented Sep 25, 2021

I just did a quick search of the PyTorch GitHub issues. This one is highly relevant: pytorch/pytorch#14768

The suggested solution is, at the Linux command line, run ulimit -n 512000 to set the max number of open files to 512,000 (on leonardo, the limit was previously only 1,024). I'll try that now... Perhaps that will fix the issue for us, so I'll also add this to the README...

Also, running ulimit -a | grep "open files" - will show you the current setting.

To set permanently, edit /etc/security/limits.conf as root and add this line: * soft nofile 512000 then log out and log back in again.

@JackKelly
Copy link
Member Author

Hmm, this problem as happened again :(

I'm going to try once more... Before, I set ulimit -n 512000 while prepare_ml_data.py was running, so maybe it didn't take effect... I'll try running prepare_ml_data.py again after setting ulimit -n 512000...

@JackKelly
Copy link
Member Author

JackKelly commented Oct 7, 2021

This problem just happened again on leonardo (and the ulimit for open files has been set to 512,000):

I think the best solution to this problem is to remove PyTorch from the code #86

2021-10-06 21:17:37,438 INFO /home/jack/dev/ocf/nowcasting_dataset/scripts/./prepare_ml_data.py 178 Got batch 4796
2021-10-06 21:17:41,551 INFO /home/jack/dev/ocf/nowcasting_dataset/scripts/./prepare_ml_data.py 178 Got batch 4797
Traceback (most recent call last):
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/multiprocessing/queues.py", line 245, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 319, in reduce_storage
    metadata = storage._share_filename_()
RuntimeError: unable to open shared memory object </torch_4094450_3914785142> in read-write mode
2021-10-06 21:17:46,604 INFO /home/jack/dev/ocf/nowcasting_dataset/scripts/./prepare_ml_data.py 178 Got batch 4798
2021-10-06 21:17:51,093 INFO /home/jack/dev/ocf/nowcasting_dataset/scripts/./prepare_ml_data.py 178 Got batch 4799

@flowirtz flowirtz moved this to In Progress in Nowcasting Oct 15, 2021
@JackKelly
Copy link
Member Author

PR #307 removed the pytorch dataloader, so this issue is no longer relevant

Repository owner moved this from In Progress to Done in Nowcasting Nov 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
No open projects
Status: Done
Development

No branches or pull requests

1 participant