-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Improve error handling in make_dataset #3496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
with self.assertRaises(RuntimeError): | ||
with self.assertRaises(FileNotFoundError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This BC breaking if someone relies on DatasetFolder
or ImageFolder
to raise a RuntimeError
if no samples are found. Since the error type was never documented anywhere, I think this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say this is fine but might be worth checking if it breaks something in FBCODE.
if class_to_idx is None: | ||
_, class_to_idx = find_classes(directory) | ||
elif not class_to_idx: | ||
raise ValueError("'class_to_index' must have at least one entry to collect any samples.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is BC breaking, but in a "right" way. Before make_dataset
silently returned an empty list in case class_to_idx
was empty. IMO it is a reasonable assumption that no user calls make_dataset
without having at least a single class. If you disagree with this assumption, we can get BC back by return []
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, especially for the folder dataset.
Is there any other dataset that inherits this other than videodatasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any other dataset that inherits this other than videodatasets?
Nothing built-in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmeier DatasetFolder.make_dataset
is public and the docstring is:
class_to_idx (Optional[Dict[str, int]]): Dictionary mapping class name to class index. If omitted, is generated by :func:`find_classes`
The problem is that users can override DatasetFolder.find_classes
too, so there's a conflict with the find_classes()
function: DatasetFolder.find_classes
will rely on the function rather than on the method.
Should we make class_to_idx
a mandatory parameter in DatasetFolder.make_dataset
to avoid any potential issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To better explain myself: imagine a scenario where someone has a class MyCoolNewDataset(DatasetFolder)
and they override MyCoolNewDataset.find_classes
with a custom class_to_idx
logic.
If they call MyCoolNewDataset.make_dataset
while passing None
to the class_to_idx
parameter, what will be used is the class_to_idx
logic from the find_classes
function, instead of using the logic from the find_classes
method - which is different since they overrode it.
Does that make sense?
To avoid such issues I'm suggesting to force the user to pass class_to_idx
in DatasetFolder.make_dataset
, or more accurately to raise an error if None
is passed in DatasetFolder.make_dataset
Codecov Report
@@ Coverage Diff @@
## master #3496 +/- ##
==========================================
+ Coverage 78.83% 78.99% +0.16%
==========================================
Files 105 105
Lines 9816 9793 -23
Branches 1581 1573 -8
==========================================
- Hits 7738 7736 -2
+ Misses 1588 1573 -15
+ Partials 490 484 -6
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Should we add a test for make-dataset with no classes as well just to be on the safe side?
Also, I've added a comment, but are there any other datasets that rely on folder dataset?
if class_to_idx is None: | ||
_, class_to_idx = find_classes(directory) | ||
elif not class_to_idx: | ||
raise ValueError("'class_to_index' must have at least one entry to collect any samples.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, especially for the folder dataset.
Is there any other dataset that inherits this other than videodatasets?
with self.assertRaises(RuntimeError): | ||
with self.assertRaises(FileNotFoundError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say this is fine but might be worth checking if it breaks something in FBCODE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my questions answered. LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@staticmethod | ||
def _find_classes(dir: str) -> Tuple[List[str], Dict[str, int]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wouldn't have made this a staticmethod
, as other instantiations of this dataset could rely on self
for generating a custom set of classes and class ids
But this is not really important, so let's move forward with this PR and get this merged
Summary: * factor out find_classes * use find_classes in video datasets * adapt old tests Reviewed By: fmassa Differential Revision: D27433918 fbshipit-source-id: 60d8da2f222a19e0757197f5d38b6a9cce7694a8
Closes #3495, fixes #2903.