-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add MovingMNIST #2690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MovingMNIST #2690
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the draft ! I left few comments
@@ -435,6 +435,74 @@ def get_int(b: bytes) -> int: | |||
return int(codecs.encode(b, 'hex'), 16) | |||
|
|||
|
|||
class MovingMNIST(VisionDataset): | |||
"""MovingMNIST""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please a link (http://www.cs.toronto.edu/~nitish/unsupervised_video/) to the dataset like that
`MNIST <http://yann.lecun.com/exdb/mnist/>`_ Dataset.
and define docstring Args
etc as it is done for other datasets.
@@ -435,6 +435,74 @@ def get_int(b: bytes) -> int: | |||
return int(codecs.encode(b, 'hex'), 16) | |||
|
|||
|
|||
class MovingMNIST(VisionDataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we inherit it from MNIST
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use this as a video dataset, we shouldn't. We probably need to use MNIST
to generate the training split though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @vballoli,
thanks a lot for this PR. We discussed this internally and have some comments on this:
MovingMNIST
is a video dataset, i. e. the individual frames have a strong temporal relation. Currently you are treating it as an image dataset. We should return 4D tensor representing a single video (channels x depth x height x weight
) rather than an image with multiple channels (channels x height x weight
). This also means that we shouldn't treat the frames individually for the transforms.- You only implemented the test split, since this is the only generated data provided by the author. Since the authors also provide the code to generate the sequences (see for example the implementation by
tensorflow
), it might be beneficial to also provide a train split. If we do this, we need to decide if we want to generate the sequences ahead-of-time or on-the-fly. Depending on how large the training split will be, the former might be to much on a burden on the memory (even the test split is ~1GB and resides completely in memory). - Although the authors split the sequences of two with 10 frames each, this should not be required since
MovingMNIST
is marketed as an unsupervised dataset. This is especially important if we generate the trainings data with a deviating number of frames. IMO we should simply return a single video. Maybe we could provide a (static) method that implements this split by the authors.
Let us know what you think and if you are willing to continue working on this.
@@ -435,6 +435,74 @@ def get_int(b: bytes) -> int: | |||
return int(codecs.encode(b, 'hex'), 16) | |||
|
|||
|
|||
class MovingMNIST(VisionDataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use this as a video dataset, we shouldn't. We probably need to use MNIST
to generate the training split though.
self.file))) | ||
|
||
|
||
def __getitem__(self, index: int) -> Tuple[Any, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a length, the dataset is not iterable.
def __getitem__(self, index: int) -> Tuple[Any, Any]: | |
def __len__(self) -> int: | |
return len(self.data) | |
def __getitem__(self, index: int) -> Tuple[Any, Any]: |
with open(path, 'rb') as f: | ||
x = torch.tensor(np.load(f)) | ||
assert(x.dtype == torch.uint8) | ||
return x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
return x | |
return x | |
To complement @pmeier answer, if you could let the dataset return a single tensor of shape |
@pmeier Somebody working on closing this PR? If no, I can help. |
Thanks @vballoli for this great contribution and sorry for the long time to get a new reply! Unfortunately, there is a new datasets API in the works here and new datasets should follow the new API design. That being said, rest assured, there are at least two options:
Hope one of the two options works for you @vballoli. 👌 |
PR in reference to #2676. Currently in draft since the dataset train-test split needs discussion. Inputs regarding how to better handle the train-test arguments is appreciated! @vfdev-5 @pmeier @fmassa