C++ Caltech Dataset #1121

ShahriarRezghi · 2019-07-15T16:27:45Z

Description:

Datasets mainly need filesystem to operate. C++17 has filesystem but since we are using C++11 I had to write it with system calls and since I don't have a working windows environment I have only written POSIX side (doesn't work on windows but works fine on others).
For reading images I have used OpenCV. Python models take in a transform function that does some operations on a PILL image. I have written a similar function that takes a cv::Mat and returns the matrix after doing operations on it. there are some default transform functions in the namespace cv_transforms that convert image to rgb or grayscale and resize to 224x224.
Right now I have only written Caltech101 dataset. If we agree on the structure I can write the rest.
Right now there is no method of downloading the datasets directly in C++. We can use a library like boost or others. Or we can write one from scratch with socket programming. Or we can call python functions (which I don't think is a good idea) (haven't decided yet).
The code is close to python in the internals and API of datasets. It does vary in utilities.
Also I have moved global.h up to include it in datasets too.
For tests we can create fake data in Python and call C++ functions on it using bindings to test. Or we can do the whole thing in C++ and use gtest (haven't decided yet).

codecov-io · 2019-07-15T16:43:31Z

Codecov Report

Merging #1121 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #1121   +/-   ##
=======================================
  Coverage   65.78%   65.78%           
=======================================
  Files          79       79           
  Lines        5834     5834           
  Branches      887      887           
=======================================
  Hits         3838     3838           
  Misses       1726     1726           
  Partials      270      270

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2287c8f...adc716d. Read the comment docs.

fmassa · 2019-07-26T09:18:04Z

Hi @ShahriarSS

Thanks for the PR!

Here are some initial thoughts:

Use of OpenCV:

The C++ models that we have are compatible with the Python models. OpenCV return the images in a different format than what we generally use (BGR in 0-255, instead of RGB in 0-1).
I would like the C++ frontend to be equivalent to the Python frontend.
This might mean writing a few compatibility functions (maybe using OpenCV, maybe not) that returns the images, and perform image operations on them, in the same format as in the Python frontend.

I'm currently reworking the transforms to also accept PyTorch tensors. This means that we could also do the same thing in C++, and use torch ops for rescaling / etc. The question now is to have a basic read-image function that is compatible with the rest of the codebase.

Also, can we make the API of the methods require OpenCV? For example, we should avoid passing and returning cv::Mat objects, and instead use Tensors. This would make changing the backend much simpler.

What about the following: Could you split this PR into two components:

the data reading abstractions, with minimal functions for reading an image, and not exposing OpenCV in the API
the dataset itself, which leverages the data reading functionality.

Filesystem access

We have a large user-base of Windows users, and it might be worth considering if we want to add support for Windows right in the beginning or not. I'm ccing @soumith on this

Download inside datasets

I don't think this is a requirement for now.

Tests

Ideally we would use the same Python functions that create fake datasets, save them to disk and read with the C++ frontend to verify that everything works as expected.
I think it might be better to just write the tests straight in C++, to avoid having to expose a number of things in Python (and having to write the binding code just to test it)

yassineAlouini · 2022-05-02T15:59:13Z

Thanks @ShahriarSS for this contribution and sorry for the late reply.

The datasets API is being rewritten into a new API as described here.

Not yet sure if it includes C++ datasets (or if it is planned) as well but I suppose it does for now (or will shortly). @pmeier is there a README that details the migration process for C++ datasets? Or maybe this only concerns Python ones?

I would suggest the following:

take into account @fmassa's comments as mentioned here: C++ Caltech Dataset #1121 (comment)
once this is done, someone can help you @ShahriarSS migrate to the new API design or you can do it yourself if you would like to of course.

What do you think @ShahriarSS about that? Feel free to ask additional questions, thanks!

pmeier · 2022-05-03T06:42:36Z

Not yet sure if it includes C++ datasets (or if it is planned) as well but I suppose it does for now (or will shortly)

There are no plans to have C++ wrappers for the datasets. Looking at this PR, the data reading, i.e. image decoding part in C++, is already implemented in torchvision.io.

Given that this PR is stale for ~3 years, I'll close this for now. @ShahriarSS if there is still need for the C++ dataset wrappers, would you mind opening an issue?

ShahriarSS added 3 commits July 15, 2019 02:15

Added files and changed CMakeLists.txt

fca3dcc

Wrote some stuff

225edb8

Completed caltech101

c53dd97

Changed filesystem to posix

b47430f

ShahriarSS added 3 commits July 29, 2019 16:34

Removed Caltech and cv transform

aee40af

Merge branch 'master' into c++-caltech-dataset

24192da

Fixed a few stuff

adc716d

ShahriarRezghi mentioned this pull request Jul 29, 2019

Implementing image reading functions #1179

Closed

pmeier self-assigned this Apr 8, 2022

yassineAlouini mentioned this pull request May 2, 2022

Possible new contribution? pmeier/pmeier#5

Open

pmeier closed this May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++ Caltech Dataset #1121

C++ Caltech Dataset #1121

Uh oh!

ShahriarRezghi commented Jul 15, 2019 •

edited

Loading

Uh oh!

codecov-io commented Jul 15, 2019 •

edited

Loading

Uh oh!

fmassa commented Jul 26, 2019

Uh oh!

yassineAlouini commented May 2, 2022

Uh oh!

pmeier commented May 3, 2022

Uh oh!

Uh oh!

C++ Caltech Dataset #1121

C++ Caltech Dataset #1121

Uh oh!

Conversation

ShahriarRezghi commented Jul 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Jul 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fmassa commented Jul 26, 2019

Use of OpenCV:

Filesystem access

Download inside datasets

Tests

Uh oh!

yassineAlouini commented May 2, 2022

Uh oh!

pmeier commented May 3, 2022

Uh oh!

Uh oh!

ShahriarRezghi commented Jul 15, 2019 •

edited

Loading

codecov-io commented Jul 15, 2019 •

edited

Loading