-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Provide the meta.bin file of the ImageNet dataset together with torchvision? #1647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure if we can distribute the But there might be a way of getting the same information from the imagenet website, without having to download the full dataset. For example, looking a bit around, I was able to find the synsets in http://www.image-net.org/api/text/imagenet.synset.obtain_synset_list Maybe having a closer look at http://image-net.org/download-API might give some hints on what to do? |
You mean due to licensing issues or something else? The content
https://github.com/raw/soumith/imagenetloader.torch/master/valprep.sh After creating the directories, each validation image is moved into the respective folder. If we for whatever reason cannot provide the same information here, we could simply parse this file.
That list has 21841 entries. Without further investigating, I think these are simply all available classes of WordNet. In the |
@pmeier yes, this might be a problem due to licensing issues. Maybe there is a way of getting the |
@fmassa @pmeier I'm assuming the goal here is simply to be able to instantiate an |
The functionality you describe is already implemented vision/torchvision/datasets/imagenet.py Lines 40 to 54 in 61763fa
I think these problems arise because the users don't know that there is a I've looked around, but I can't find the information we need online. Maybe you missed that in my former post, but is there a reason to not use the file https://github.com/raw/soumith/imagenetloader.torch/master/valprep.sh ? Licensing should not be problem, since we are already hosting it and also using it as part of the official ImageNet example. This is easily parsed and contains enough information to use the from contextlib import contextmanager
from os import path
import shutil
import tempfile
import re
from torchvision.datasets.utils import download_url
PATTERN = re.compile("mv ILSVRC2012_val_000(?P<idx>\d{5}).JPEG (?P<wnid>n\d{8})/")
URL = "https://github.com/raw/soumith/imagenetloader.torch/master/valprep.sh"
@contextmanager
def get_tmp_dir(**kwargs):
tmp_dir = tempfile.mkdtemp(**kwargs)
try:
yield tmp_dir
finally:
shutil.rmtree(tmp_dir)
with get_tmp_dir() as tmp_dir:
download_url(URL, tmp_dir)
with open(path.join(tmp_dir, path.basename(URL)), "r") as fh:
lines = fh.readlines()
data = []
for line in lines:
match = PATTERN.match(line.strip())
if match is None:
continue
idx = int(match.group("idx"))
wnid = match.group("wnid")
data.append((idx, wnid))
_, val_wnids, = zip(*sorted(data)) The other component of the |
@fmassa In the light of recent problems with the
meta.bin
file of the ImageNet dataset (#1645 #1646 ), I think it is reasonable to ask, if we can provide it together withtorchvision
. Especially now without official download links for the archives, I think it would be beneficial. With it users that switch totorchvision
and only have the image archives do not need to download the devkit.The text was updated successfully, but these errors were encountered: