Eliminate runtime cyclic dependencies #6476

datumbox · 2022-08-23T16:29:47Z

A previous approach of assigning _F on constructor faced issues on the data loader test as the the unit-test hang. This PR tries an alternative approach of making _F a property, doing a lazy import and caching the the module reference on the class level. This seems to avoid the previous test issue.

datumbox · 2022-08-23T17:20:38Z

torchvision/prototype/features/_feature.py

+    @property
+    def _F(self) -> ModuleType:
+        if _Feature.__F is None:
+            from ..transforms import functional
+
+            _Feature.__F = functional
+        return _Feature.__F


Lazy import the module the first time we need it. It's shared across all _Feature subclasses.

pmeier · 2022-08-24T07:15:55Z

Given that there were problems with the DataLoader before, here are the tests that we use for the prototype datasets:

vision/test/test_prototype_builtin_datasets.py

Lines 104 to 135 in b4b246a

    
           @pytest.mark.parametrize("only_datapipe", [False, True]) 
        
           @parametrize_dataset_mocks(DATASET_MOCKS) 
        
           def test_traversable(self, dataset_mock, config, only_datapipe): 
        
               dataset, _ = dataset_mock.load(config) 
        
               traverse(dataset, only_datapipe=only_datapipe) 
        
           @parametrize_dataset_mocks(DATASET_MOCKS) 
        
           def test_serializable(self, dataset_mock, config): 
        
               dataset, _ = dataset_mock.load(config) 
        
               pickle.dumps(dataset) 
        
           # This has to be a proper function, since lambda's or local functions 
        
           # cannot be pickled, but this is a requirement for the DataLoader with 
        
           # multiprocessing, i.e. num_workers > 0 
        
           def _collate_fn(self, batch): 
        
               return batch 
        
           @pytest.mark.parametrize("num_workers", [0, 1]) 
        
           @parametrize_dataset_mocks(DATASET_MOCKS) 
        
           def test_data_loader(self, dataset_mock, config, num_workers): 
        
               dataset, _ = dataset_mock.load(config) 
        
               dl = DataLoader( 
        
                   dataset, 
        
                   batch_size=2, 
        
                   num_workers=num_workers, 
        
                   collate_fn=self._collate_fn, 
        
               ) 
        
               next(iter(dl))

Here is a version to run the same tests for different transforms:

import pickle

import pytest
import torch

from torch.utils.data import DataLoader
from torch.utils.data.graph import traverse
from torchdata.datapipes.iter import IterableWrapper
from torchvision.prototype import transforms, features

# TODO: fill me up for more comprehensive tests!
TRANSFORMS = [
    transforms.Resize((8, 8)),
]

transforms = pytest.mark.parametrize("transform", TRANSFORMS, ids=lambda transform: type(transform).__name__)


@pytest.mark.parametrize("only_datapipe", [False, True])
@transforms
def test_traversable(transform, only_datapipe):
    dp = IterableWrapper([]).map(transform)

    traverse(dp, only_datapipe=only_datapipe)


@transforms
def test_serializable(transform):
    dp = IterableWrapper([]).map(transform)

    pickle.dumps(dp)


def _collate_fn(batch):
    return batch


@pytest.mark.parametrize("num_workers", [0, 1])
@transforms
def test_data_loader(transform, num_workers):
    dp = IterableWrapper([features.Image(torch.rand(3, 16, 16)) for _ in range(5)], deepcopy=False).map(transform)

    dl = DataLoader(
        dp,
        batch_size=2,
        num_workers=num_workers,
        collate_fn=_collate_fn,
    )

    list(dl)

So far the PR seems good.

datumbox · 2022-08-24T08:30:12Z

@pmeier Thanks for the suggestion for additional tests.

I know that these tests take a while due to the large number of datasets. Shall I add just one transform for now to enhance them? aka

    def test_data_loader(self, dataset_mock, config, num_workers):
        dataset = dataset_mock.load(config)[0].map(transforms.Resize((8, 8)))

        dl = DataLoader(
            dataset,
            batch_size=2,
            num_workers=num_workers,
            collate_fn=self._collate_fn,
        )

        next(iter(dl))

I was thinking of doing this for all 3 tests you quoted above.

pmeier · 2022-08-24T08:45:56Z

You need at least two transforms

transforms.Compose(
    [
        transforms.DecodeImage(),
        transforms.Resize(8, 8),
    ]
)

Otherwise, you will have no images inside the sample.

I'm ok with adding these there given that our current transforms compatibility tests are quite "minimal":

vision/test/test_prototype_builtin_datasets.py

Lines 98 to 102 in b4b246a

    
           @parametrize_dataset_mocks(DATASET_MOCKS) 
        
           def test_transformable(self, dataset_mock, config): 
        
               dataset, _ = dataset_mock.load(config) 
        
               next(iter(dataset.map(transforms.Identity())))

@NicolasHug do you have an opinion here?

pmeier

I'm fine with the changes to the datasets tests minus some small nits. There is test_transformable which is outside of the scope that GH let's me comment. It will be obsolete after your changes and you can delete it.

Given that only the test_data_loader actually executes the transform, I don't think we should see a significant slow down. Still, let's wait for CI to complete to compare the timings. @NicolasHug any objections?

test/test_prototype_builtin_datasets.py

datumbox · 2022-08-24T09:24:43Z

@pmeier I'll remove the extra tests to give time to you and @NicolasHug to finalize the strategy. They are not strictly needed for this work.

BTW I already see some unrelated failures related to the resize on bboxes that might be worth looking into.

torchvision/prototype/features/_feature.py

pmeier

Apart from my paranoia in #6476 (comment), I think this is GTG.

datumbox · 2022-08-24T10:59:40Z

Merging, the failing tests are unrelated and are caused by dependencies issues from core.

Summary: * Move imports on constructor. * Turn `_F` to a property. * fix linter * Fix mypy * Make it class-wide attribute. * Add tests based on code review * Making changes from code-reviews. * Remove the new tests. * Clean up. * Adding comments. * Update the comment link. Reviewed By: datumbox Differential Revision: D39013658 fbshipit-source-id: 9814c11c44127268b11a55880b9b0c087561deec

datumbox added 2 commits August 23, 2022 17:17

Move imports on constructor.

692d6b9

Turn _F to a property.

9f4655d

datumbox added module: transforms code quality prototype labels Aug 23, 2022

datumbox requested review from vfdev-5 and pmeier August 23, 2022 16:29

facebook-github-bot added the cla signed label Aug 23, 2022

datumbox added 2 commits August 23, 2022 17:36

fix linter

9c2ea66

Fix mypy

7c4e293

datumbox force-pushed the prototype/cyclic_imports branch from a33f1bc to f50d0f4 Compare August 23, 2022 16:54

Make it class-wide attribute.

b8ed25b

datumbox force-pushed the prototype/cyclic_imports branch from f50d0f4 to b8ed25b Compare August 23, 2022 16:55

datumbox commented Aug 23, 2022

View reviewed changes

Add tests based on code review

db43b6b

pmeier reviewed Aug 24, 2022

View reviewed changes

test/test_prototype_builtin_datasets.py Outdated Show resolved Hide resolved

test/test_prototype_builtin_datasets.py Outdated Show resolved Hide resolved

Making changes from code-reviews.

223af74

datumbox added 3 commits August 24, 2022 10:26

Remove the new tests.

554ac94

Clean up.

68a0e18

Adding comments.

df056c1

datumbox requested a review from pmeier August 24, 2022 09:46

datumbox mentioned this pull request Aug 24, 2022

[proto] Added mid-level ops and feature-based ops #6219

Merged

pmeier reviewed Aug 24, 2022

View reviewed changes

torchvision/prototype/features/_feature.py Outdated Show resolved Hide resolved

pmeier approved these changes Aug 24, 2022

View reviewed changes

Update the comment link.

355ffb6

Merge branch 'main' into prototype/cyclic_imports

78c8a8e

datumbox merged commit 39fe34a into pytorch:main Aug 24, 2022

datumbox deleted the prototype/cyclic_imports branch August 24, 2022 11:00

pmeier mentioned this pull request Sep 15, 2022

extract kernels into separate module to avoid cyclic imports #6398

Closed

pmeier mentioned this pull request Oct 20, 2022

Unwrap features before passing them into a kernel #6807

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eliminate runtime cyclic dependencies #6476

Eliminate runtime cyclic dependencies #6476

Uh oh!

datumbox commented Aug 23, 2022 •

edited

Loading

Uh oh!

datumbox Aug 23, 2022

Uh oh!

pmeier commented Aug 24, 2022

Uh oh!

datumbox commented Aug 24, 2022 •

edited

Loading

Uh oh!

pmeier commented Aug 24, 2022

Uh oh!

pmeier left a comment

Uh oh!

Uh oh!

Uh oh!

datumbox commented Aug 24, 2022 •

edited

Loading

Uh oh!

Uh oh!

pmeier left a comment

Uh oh!

datumbox commented Aug 24, 2022

Uh oh!

Uh oh!

Eliminate runtime cyclic dependencies #6476

Eliminate runtime cyclic dependencies #6476

Uh oh!

Conversation

datumbox commented Aug 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datumbox Aug 23, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier commented Aug 24, 2022

Uh oh!

datumbox commented Aug 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier commented Aug 24, 2022

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

datumbox commented Aug 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

datumbox commented Aug 24, 2022

Uh oh!

Uh oh!

datumbox commented Aug 23, 2022 •

edited

Loading

datumbox commented Aug 24, 2022 •

edited

Loading

datumbox commented Aug 24, 2022 •

edited

Loading