Skip to content

Dataset MovingMNIST: split=None returns test dataset #7439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Shu-Wan opened this issue Mar 22, 2023 · 5 comments · Fixed by #7449
Closed

Dataset MovingMNIST: split=None returns test dataset #7439

Shu-Wan opened this issue Mar 22, 2023 · 5 comments · Fixed by #7449

Comments

@Shu-Wan
Copy link
Contributor

Shu-Wan commented Mar 22, 2023

🐛 Describe the bug

I've found a bug in the code for torchvision's MovingMNIST dataset, which causes only the test dataset to be returned when split=None. According to the documentation, when split is set to None, the entire dataset should be returned. However, this is not currently happening.

Args:
root (string): Root directory of dataset where ``MovingMNIST/mnist_test_seq.npy`` exists.
split (string, optional): The dataset split, supports ``None`` (default), ``"train"`` and ``"test"``.
If ``split=None``, the full data is returned.
split_ratio (int, optional): The split ratio of number of frames. If ``split="train"``, the first split
frames ``data[:, :split_ratio]`` is returned. If ``split="test"``, the last split frames ``data[:, split_ratio:]``
is returned. If ``split=None``, this parameter is ignored and the all frames data is returned.

I've tested this with the following code:

from torchvision import datasets
import torch

dataset = datasets.MovingMNIST(root="data", download=True)
dataset[0].size() # returns torch.Size([10, 1, 64, 64]), but I expected torch.Size([20, 1, 64, 64])

I believe the bug is caused by lines 58-62 in the code, which handle None and test splits together:

if split is not None:
verify_str_arg(split, "split", ("train", "test"))
self.split = split
if not isinstance(split_ratio, int):
raise TypeError(f"`split_ratio` should be an integer, but got {type(split_ratio)}")
elif not (1 <= split_ratio <= 19):
raise ValueError(f"`split_ratio` should be `1 <= split_ratio <= 19`, but got {split_ratio} instead.")
self.split_ratio = split_ratio
if download:
self.download()
if not self._check_exists():
raise RuntimeError("Dataset not found. You can use download=True to download it.")
data = torch.from_numpy(np.load(os.path.join(self._base_folder, self._filename)))
if self.split == "train":
data = data[: self.split_ratio]
else:
data = data[self.split_ratio :]

To fix this, I propose the following two changes:

  • Separate the handling of None and test splits in the code.
  • Only process lines 46-50 when split is not None.

Reference issue: #6981

I'm happy to help on this issue, please assign to me on this one.

Versions

PyTorch version: 2.0.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.2.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.0 (clang-1400.0.29.202)
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:26:08) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.2.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.2
[pip3] torch==2.0.0
[pip3] torch-tb-profiler==0.4.1
[pip3] torchvision==0.15.1
[conda] numpy 1.24.2 py310h3d2048e_0 conda-forge
[conda] pytorch 2.0.0 py3.10_0 pytorch
[conda] torch 2.0.0 pypi_0 pypi
[conda] torch-tb-profiler 0.4.1 pypi_0 pypi
[conda] torchvision 0.15.1 pypi_0 pypi

cc @pmeier

@pmeier
Copy link
Collaborator

pmeier commented Mar 22, 2023

Thanks for the detailed report @Shu-Wan! This indeed seems wrong. IIUC, it should be enough to replace the else branch with an elif:

diff --git a/torchvision/datasets/moving_mnist.py b/torchvision/datasets/moving_mnist.py
index afff0bfa3b..ac5a2b1503 100644
--- a/torchvision/datasets/moving_mnist.py
+++ b/torchvision/datasets/moving_mnist.py
@@ -58,7 +58,7 @@ class MovingMNIST(VisionDataset):
         data = torch.from_numpy(np.load(os.path.join(self._base_folder, self._filename)))
         if self.split == "train":
             data = data[: self.split_ratio]
-        else:
+        elif self.split == "test":
             data = data[self.split_ratio :]
         self.data = data.transpose(0, 1).unsqueeze(2).contiguous()

Wondering why our tests missed this.

I'm happy to help on this issue, please assign to me on this one.

Go for it!

@pmeier
Copy link
Collaborator

pmeier commented Mar 22, 2023

Welp, that was not smart:

vision/test/test_datasets.py

Lines 1520 to 1529 in f0a1df3

@datasets_utils.test_all_configs
def test_split(self, config):
if config["split"] is None:
return
with self.create_dataset(config) as (dataset, info):
if config["split"] == "train":
assert (dataset.data == 0).all()
else:
assert (dataset.data == 1).all()

We should assert there that the second dimension has 20 elements for split=None.

@pmeier
Copy link
Collaborator

pmeier commented Mar 22, 2023

Let's also change

num_samples = 20

to a different number like 5 or whatever to avoid confusing the number of samples with the number of frames.

@pmeier
Copy link
Collaborator

pmeier commented Mar 22, 2023

@Shu-Wan we are preparing for the 0.15.2 bug fix release. Since MovingMNIST was released with 0.15, it would be good to get this fix in. Do you happen to have time to send patch soon (within one week)? If not, are you ok with me taking over so we can get it released?

@Shu-Wan
Copy link
Contributor Author

Shu-Wan commented Mar 22, 2023 via email

Shu-Wan added a commit to Shu-Wan/vision that referenced this issue Mar 22, 2023
@Shu-Wan Shu-Wan mentioned this issue Mar 22, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants