PCAM dataset download fails #5800

benbo · 2022-04-10T21:26:48Z

🐛 Describe the bug

PCAM download fails for both the train and test splits due to a gzip error. I deleted all files that were created and tried it again from scratch but the same error persists.

>>> dataset = datasets.PCAM(dpath, split = 'train', download = True)                                                                                                                                      
2245it [00:00, 20206464.55it/s]                                                                                                                                                                           
Traceback (most recent call last):                                                                                                                                                                        
  File "<stdin>", line 1, in <module>                                                                                                                                                                     
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/site-packages/torchvision/datasets/pcam.py", line 92, in __init__                                                                 
    self._download()                                                                                                                                                                                      
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/site-packages/torchvision/datasets/pcam.py", line 130, in _download                                                               
    _decompress(str(self._base_folder / archive_name))                                                                                                                                                    
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/site-packages/torchvision/datasets/utils.py", line 372, in _decompress                                                            
    wfh.write(rfh.read())                                                                                                                                                                                 
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/gzip.py", line 300, in read                                                                                                       
    return self._buffer.read(size)                                                                                                                                                                        
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/gzip.py", line 487, in read
    if not self._read_gzip_header():
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/gzip.py", line 435, in _read_gzip_header                                                                                         
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'<!')

Versions

Collecting environment information...
PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Springdale Linux release 8.5 (Modena) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
Clang version: Could not collect
CMake version: version 3.20.2
Libc version: glibc-2.28

Python version: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.18.0-348.2.1.el8_5.x86_64-x86_64-with-glibc2.28
Is CUDA available: True
CUDA runtime version: 11.5.119
GPU models and configuration:
GPU 0: NVIDIA RTX A6000

Nvidia driver version: 495.29.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] pytorch-lightning==1.5.8
[pip3] torch==1.11.0
[pip3] torch-fidelity==0.3.0
[pip3] torchinfo==1.6.3
[pip3] torchmetrics==0.6.2
[pip3] torchvision==0.12.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] mypy-extensions 0.4.3 pypi_0 pypi
[conda] mypy_extensions 0.4.3 py39h06a4308_1
[conda] numpy 1.19.5 pypi_0 pypi
[conda] numpy-base 1.21.2 py39h79a1101_0
[conda] pytorch 1.11.0 py3.9_cuda11.3_cudnn8.2.0_0 pytorch
[conda] pytorch-lightning 1.5.8 pyhd8ed1ab_0 conda-forge
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch-fidelity 0.3.0 pypi_0 pypi
[conda] torchinfo 1.6.3 pyhd8ed1ab_0 conda-forge
[conda] torchmetrics 0.6.2 pyhd8ed1ab_0 conda-forge
[conda] torchvision 0.12.0 py39_cu113 pytorch

cc @pmeier @YosuaMichael

The text was updated successfully, but these errors were encountered:

pmeier · 2022-04-11T06:26:30Z

GDrive added a new check so you now have to manually confirm that you don't want a virus check even if you use a direct download link. See #5615. This was fixed #5645 and will be part of the next release.

benbo · 2022-04-11T14:23:09Z

Thanks @pmeier, apologies for the duplicate issue.

pmeier · 2022-04-19T11:47:39Z

Install a nightly build or from source. The fix will be included and it should work out of the box.
Download the files manually and place them in the root folder. If the file is already downloaded, the dataset will pick up on it.

benbo changed the title ~~PCAM dataset download~~ PCAM dataset download fails Apr 10, 2022

pmeier closed this as completed Apr 11, 2022

pmeier added bug duplicate module: datasets labels Apr 11, 2022

pmeier mentioned this issue Apr 11, 2022

improve error handling for GDrive downloads #5704

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PCAM dataset download fails #5800

PCAM dataset download fails #5800

benbo commented Apr 10, 2022 •

edited by pytorch-bot bot

Loading

pmeier commented Apr 11, 2022

Uh oh!

benbo commented Apr 11, 2022

Uh oh!

pmeier commented Apr 19, 2022

Uh oh!

PCAM dataset download fails #5800

PCAM dataset download fails #5800

Comments

benbo commented Apr 10, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐛 Describe the bug

Versions

pmeier commented Apr 11, 2022

Uh oh!

benbo commented Apr 11, 2022

Uh oh!

pmeier commented Apr 19, 2022

Uh oh!

benbo commented Apr 10, 2022 •

edited by pytorch-bot bot

Loading