Dataset MovingMNIST: `split=None` returns test dataset #7439

Shu-Wan · 2023-03-22T00:51:12Z

🐛 Describe the bug

I've found a bug in the code for torchvision's MovingMNIST dataset, which causes only the test dataset to be returned when split=None. According to the documentation, when split is set to None, the entire dataset should be returned. However, this is not currently happening.

vision/torchvision/datasets/moving_mnist.py

Lines 13 to 19 in b403bfc

    
               Args: 
        
                   root (string): Root directory of dataset where ``MovingMNIST/mnist_test_seq.npy`` exists. 
        
                   split (string, optional): The dataset split, supports ``None`` (default), ``"train"`` and ``"test"``. 
        
                       If ``split=None``, the full data is returned. 
        
                   split_ratio (int, optional): The split ratio of number of frames. If ``split="train"``, the first split 
        
                       frames ``data[:, :split_ratio]`` is returned. If ``split="test"``, the last split frames ``data[:, split_ratio:]`` 
        
                       is returned. If ``split=None``, this parameter is ignored and the all frames data is returned.

I've tested this with the following code:

from torchvision import datasets
import torch

dataset = datasets.MovingMNIST(root="data", download=True)
dataset[0].size() # returns torch.Size([10, 1, 64, 64]), but I expected torch.Size([20, 1, 64, 64])

I believe the bug is caused by lines 58-62 in the code, which handle None and test splits together:

vision/torchvision/datasets/moving_mnist.py

Lines 42 to 62 in b403bfc

    
           if split is not None: 
        
               verify_str_arg(split, "split", ("train", "test")) 
        
           self.split = split 
        
           if not isinstance(split_ratio, int): 
        
               raise TypeError(f"`split_ratio` should be an integer, but got {type(split_ratio)}") 
        
           elif not (1 <= split_ratio <= 19): 
        
               raise ValueError(f"`split_ratio` should be `1 <= split_ratio <= 19`, but got {split_ratio} instead.") 
        
           self.split_ratio = split_ratio 
        
           if download: 
        
               self.download() 
        
           if not self._check_exists(): 
        
               raise RuntimeError("Dataset not found. You can use download=True to download it.") 
        
           data = torch.from_numpy(np.load(os.path.join(self._base_folder, self._filename))) 
        
           if self.split == "train": 
        
               data = data[: self.split_ratio] 
        
           else: 
        
               data = data[self.split_ratio :]

To fix this, I propose the following two changes:

Separate the handling of None and test splits in the code.
Only process lines 46-50 when split is not None.

Reference issue: #6981

I'm happy to help on this issue, please assign to me on this one.

Versions

PyTorch version: 2.0.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.2.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.0 (clang-1400.0.29.202)
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:26:08) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.2.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.2
[pip3] torch==2.0.0
[pip3] torch-tb-profiler==0.4.1
[pip3] torchvision==0.15.1
[conda] numpy 1.24.2 py310h3d2048e_0 conda-forge
[conda] pytorch 2.0.0 py3.10_0 pytorch
[conda] torch 2.0.0 pypi_0 pypi
[conda] torch-tb-profiler 0.4.1 pypi_0 pypi
[conda] torchvision 0.15.1 pypi_0 pypi

cc @pmeier

pmeier · 2023-03-22T08:52:59Z

Thanks for the detailed report @Shu-Wan! This indeed seems wrong. IIUC, it should be enough to replace the else branch with an elif:

diff --git a/torchvision/datasets/moving_mnist.py b/torchvision/datasets/moving_mnist.py
index afff0bfa3b..ac5a2b1503 100644
--- a/torchvision/datasets/moving_mnist.py
+++ b/torchvision/datasets/moving_mnist.py
@@ -58,7 +58,7 @@ class MovingMNIST(VisionDataset):
         data = torch.from_numpy(np.load(os.path.join(self._base_folder, self._filename)))
         if self.split == "train":
             data = data[: self.split_ratio]
-        else:
+        elif self.split == "test":
             data = data[self.split_ratio :]
         self.data = data.transpose(0, 1).unsqueeze(2).contiguous()

Wondering why our tests missed this.

I'm happy to help on this issue, please assign to me on this one.

Go for it!

pmeier · 2023-03-22T08:55:46Z

Welp, that was not smart:

vision/test/test_datasets.py

Lines 1520 to 1529 in f0a1df3

    
           @datasets_utils.test_all_configs 
        
           def test_split(self, config): 
        
               if config["split"] is None: 
        
                   return 
        
               with self.create_dataset(config) as (dataset, info): 
        
                   if config["split"] == "train": 
        
                       assert (dataset.data == 0).all() 
        
                   else: 
        
                       assert (dataset.data == 1).all()

We should assert there that the second dimension has 20 elements for split=None.

pmeier · 2023-03-22T09:02:25Z

Let's also change

vision/test/test_datasets.py

Line 1510 in f0a1df3

num_samples = 20

to a different number like 5 or whatever to avoid confusing the number of samples with the number of frames.

pmeier · 2023-03-22T11:44:09Z

@Shu-Wan we are preparing for the 0.15.2 bug fix release. Since MovingMNIST was released with 0.15, it would be good to get this fix in. Do you happen to have time to send patch soon (within one week)? If not, are you ok with me taking over so we can get it released?

Shu-Wan · 2023-03-22T16:16:12Z

Hi Philip, I will take care of this today. It should be an easy fix. On Mar 22, 2023, at 4:44 AM, Philip Meier ***@***.***> wrote: @Shu-Wan<https://github.com/Shu-Wan> we are preparing for the 0.15.2 bug fix release. Since MovingMNIST was released with 0.15, it would be good to get this fix in. Do you happen to have time to send patch soon? If not, are you ok with me taking over so we can get it released? — Reply to this email directly, view it on GitHub<#7439 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADZWW7PU6LFZRMQWOKOZ6YTW5LQZJANCNFSM6AAAAAAWDFFL5U>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Fixes pytorch#7439

pmeier assigned Shu-Wan Mar 22, 2023

pmeier added bug module: datasets labels Mar 22, 2023

This was referenced Mar 22, 2023

[v2.0.1] Release Tracker pytorch/pytorch#97272

Closed

Release tracker for 0.15.2 #7443

Closed

Shu-Wan added a commit to Shu-Wan/vision that referenced this issue Mar 22, 2023

Dataset MovingMNIST: split=None returns test dataset

1d89d75

Fixes pytorch#7439

Shu-Wan mentioned this issue Mar 22, 2023

MovingMNIST split fix #7449

Merged

2 tasks

pmeier closed this as completed in #7449 Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataset MovingMNIST: `split=None` returns test dataset #7439

Dataset MovingMNIST: `split=None` returns test dataset #7439

Shu-Wan commented Mar 22, 2023 •

edited by pytorch-bot bot

Loading

pmeier commented Mar 22, 2023

Uh oh!

pmeier commented Mar 22, 2023 •

edited

Loading

Uh oh!

pmeier commented Mar 22, 2023

Uh oh!

pmeier commented Mar 22, 2023 •

edited by NicolasHug

Loading

Uh oh!

Shu-Wan commented Mar 22, 2023 via email

Uh oh!

Dataset MovingMNIST: split=None returns test dataset #7439

Dataset MovingMNIST: split=None returns test dataset #7439

Comments

Shu-Wan commented Mar 22, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐛 Describe the bug

Versions

pmeier commented Mar 22, 2023

Uh oh!

pmeier commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier commented Mar 22, 2023

Uh oh!

pmeier commented Mar 22, 2023 • edited by NicolasHug Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shu-Wan commented Mar 22, 2023 via email

Uh oh!

Dataset MovingMNIST: `split=None` returns test dataset #7439

Dataset MovingMNIST: `split=None` returns test dataset #7439

Shu-Wan commented Mar 22, 2023 •

edited by pytorch-bot bot

Loading

pmeier commented Mar 22, 2023 •

edited

Loading

pmeier commented Mar 22, 2023 •

edited by NicolasHug

Loading