Skip to content

PCAM dataset download fails #5800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
benbo opened this issue Apr 10, 2022 · 3 comments
Closed

PCAM dataset download fails #5800

benbo opened this issue Apr 10, 2022 · 3 comments

Comments

@benbo
Copy link

benbo commented Apr 10, 2022

🐛 Describe the bug

PCAM download fails for both the train and test splits due to a gzip error. I deleted all files that were created and tried it again from scratch but the same error persists.

>>> dataset = datasets.PCAM(dpath, split = 'train', download = True)                                                                                                                                      
2245it [00:00, 20206464.55it/s]                                                                                                                                                                           
Traceback (most recent call last):                                                                                                                                                                        
  File "<stdin>", line 1, in <module>                                                                                                                                                                     
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/site-packages/torchvision/datasets/pcam.py", line 92, in __init__                                                                 
    self._download()                                                                                                                                                                                      
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/site-packages/torchvision/datasets/pcam.py", line 130, in _download                                                               
    _decompress(str(self._base_folder / archive_name))                                                                                                                                                    
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/site-packages/torchvision/datasets/utils.py", line 372, in _decompress                                                            
    wfh.write(rfh.read())                                                                                                                                                                                 
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/gzip.py", line 300, in read                                                                                                       
    return self._buffer.read(size)                                                                                                                                                                        
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/gzip.py", line 487, in read
    if not self._read_gzip_header():
  File "/user/miniconda3/envs/gpu24-2/lib/python3.9/gzip.py", line 435, in _read_gzip_header                                                                                         
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'<!')

Versions

Collecting environment information...
PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Springdale Linux release 8.5 (Modena) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
Clang version: Could not collect
CMake version: version 3.20.2
Libc version: glibc-2.28

Python version: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.18.0-348.2.1.el8_5.x86_64-x86_64-with-glibc2.28
Is CUDA available: True
CUDA runtime version: 11.5.119
GPU models and configuration:
GPU 0: NVIDIA RTX A6000

Nvidia driver version: 495.29.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] pytorch-lightning==1.5.8
[pip3] torch==1.11.0
[pip3] torch-fidelity==0.3.0
[pip3] torchinfo==1.6.3
[pip3] torchmetrics==0.6.2
[pip3] torchvision==0.12.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] mypy-extensions 0.4.3 pypi_0 pypi
[conda] mypy_extensions 0.4.3 py39h06a4308_1
[conda] numpy 1.19.5 pypi_0 pypi
[conda] numpy-base 1.21.2 py39h79a1101_0
[conda] pytorch 1.11.0 py3.9_cuda11.3_cudnn8.2.0_0 pytorch
[conda] pytorch-lightning 1.5.8 pyhd8ed1ab_0 conda-forge
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch-fidelity 0.3.0 pypi_0 pypi
[conda] torchinfo 1.6.3 pyhd8ed1ab_0 conda-forge
[conda] torchmetrics 0.6.2 pyhd8ed1ab_0 conda-forge
[conda] torchvision 0.12.0 py39_cu113 pytorch

cc @pmeier @YosuaMichael

@benbo benbo changed the title PCAM dataset download PCAM dataset download fails Apr 10, 2022
@pmeier
Copy link
Collaborator

pmeier commented Apr 11, 2022

GDrive added a new check so you now have to manually confirm that you don't want a virus check even if you use a direct download link. See #5615. This was fixed #5645 and will be part of the next release.

@benbo
Copy link
Author

benbo commented Apr 11, 2022

Thanks @pmeier, apologies for the duplicate issue.

@pmeier
Copy link
Collaborator

pmeier commented Apr 19, 2022

  1. Install a nightly build or from source. The fix will be included and it should work out of the box.
  2. Download the files manually and place them in the root folder. If the file is already downloaded, the dataset will pick up on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants