Skip to content

Eliminate runtime cyclic dependencies #6476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 24, 2022

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Aug 23, 2022

A previous approach of assigning _F on constructor faced issues on the data loader test as the the unit-test hang. This PR tries an alternative approach of making _F a property, doing a lazy import and caching the the module reference on the class level. This seems to avoid the previous test issue.

@datumbox datumbox force-pushed the prototype/cyclic_imports branch from a33f1bc to f50d0f4 Compare August 23, 2022 16:54
@datumbox datumbox force-pushed the prototype/cyclic_imports branch from f50d0f4 to b8ed25b Compare August 23, 2022 16:55
Comment on lines 98 to 104
@property
def _F(self) -> ModuleType:
if _Feature.__F is None:
from ..transforms import functional

_Feature.__F = functional
return _Feature.__F
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy import the module the first time we need it. It's shared across all _Feature subclasses.

@pmeier
Copy link
Collaborator

pmeier commented Aug 24, 2022

Given that there were problems with the DataLoader before, here are the tests that we use for the prototype datasets:

@pytest.mark.parametrize("only_datapipe", [False, True])
@parametrize_dataset_mocks(DATASET_MOCKS)
def test_traversable(self, dataset_mock, config, only_datapipe):
dataset, _ = dataset_mock.load(config)
traverse(dataset, only_datapipe=only_datapipe)
@parametrize_dataset_mocks(DATASET_MOCKS)
def test_serializable(self, dataset_mock, config):
dataset, _ = dataset_mock.load(config)
pickle.dumps(dataset)
# This has to be a proper function, since lambda's or local functions
# cannot be pickled, but this is a requirement for the DataLoader with
# multiprocessing, i.e. num_workers > 0
def _collate_fn(self, batch):
return batch
@pytest.mark.parametrize("num_workers", [0, 1])
@parametrize_dataset_mocks(DATASET_MOCKS)
def test_data_loader(self, dataset_mock, config, num_workers):
dataset, _ = dataset_mock.load(config)
dl = DataLoader(
dataset,
batch_size=2,
num_workers=num_workers,
collate_fn=self._collate_fn,
)
next(iter(dl))

Here is a version to run the same tests for different transforms:

import pickle

import pytest
import torch

from torch.utils.data import DataLoader
from torch.utils.data.graph import traverse
from torchdata.datapipes.iter import IterableWrapper
from torchvision.prototype import transforms, features

# TODO: fill me up for more comprehensive tests!
TRANSFORMS = [
    transforms.Resize((8, 8)),
]

transforms = pytest.mark.parametrize("transform", TRANSFORMS, ids=lambda transform: type(transform).__name__)


@pytest.mark.parametrize("only_datapipe", [False, True])
@transforms
def test_traversable(transform, only_datapipe):
    dp = IterableWrapper([]).map(transform)

    traverse(dp, only_datapipe=only_datapipe)


@transforms
def test_serializable(transform):
    dp = IterableWrapper([]).map(transform)

    pickle.dumps(dp)


def _collate_fn(batch):
    return batch


@pytest.mark.parametrize("num_workers", [0, 1])
@transforms
def test_data_loader(transform, num_workers):
    dp = IterableWrapper([features.Image(torch.rand(3, 16, 16)) for _ in range(5)], deepcopy=False).map(transform)

    dl = DataLoader(
        dp,
        batch_size=2,
        num_workers=num_workers,
        collate_fn=_collate_fn,
    )

    list(dl)

So far the PR seems good.

@datumbox
Copy link
Contributor Author

datumbox commented Aug 24, 2022

@pmeier Thanks for the suggestion for additional tests.

I know that these tests take a while due to the large number of datasets. Shall I add just one transform for now to enhance them? aka

    def test_data_loader(self, dataset_mock, config, num_workers):
        dataset = dataset_mock.load(config)[0].map(transforms.Resize((8, 8)))

        dl = DataLoader(
            dataset,
            batch_size=2,
            num_workers=num_workers,
            collate_fn=self._collate_fn,
        )

        next(iter(dl))

I was thinking of doing this for all 3 tests you quoted above.

@pmeier
Copy link
Collaborator

pmeier commented Aug 24, 2022

You need at least two transforms

transforms.Compose(
    [
        transforms.DecodeImage(),
        transforms.Resize(8, 8),
    ]
)

Otherwise, you will have no images inside the sample.

I'm ok with adding these there given that our current transforms compatibility tests are quite "minimal":

@parametrize_dataset_mocks(DATASET_MOCKS)
def test_transformable(self, dataset_mock, config):
dataset, _ = dataset_mock.load(config)
next(iter(dataset.map(transforms.Identity())))

@NicolasHug do you have an opinion here?

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the changes to the datasets tests minus some small nits. There is test_transformable which is outside of the scope that GH let's me comment. It will be obsolete after your changes and you can delete it.

Given that only the test_data_loader actually executes the transform, I don't think we should see a significant slow down. Still, let's wait for CI to complete to compare the timings. @NicolasHug any objections?

@datumbox
Copy link
Contributor Author

datumbox commented Aug 24, 2022

@pmeier I'll remove the extra tests to give time to you and @NicolasHug to finalize the strategy. They are not strictly needed for this work.

BTW I already see some unrelated failures related to the resize on bboxes that might be worth looking into.

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from my paranoia in #6476 (comment), I think this is GTG.

@datumbox
Copy link
Contributor Author

Merging, the failing tests are unrelated and are caused by dependencies issues from core.

@datumbox datumbox merged commit 39fe34a into pytorch:main Aug 24, 2022
@datumbox datumbox deleted the prototype/cyclic_imports branch August 24, 2022 11:00
facebook-github-bot pushed a commit that referenced this pull request Aug 25, 2022
Summary:
* Move imports on constructor.

* Turn `_F` to a property.

* fix linter

* Fix mypy

* Make it class-wide attribute.

* Add tests based on code review

* Making changes from code-reviews.

* Remove the new tests.

* Clean up.

* Adding comments.

* Update the comment link.

Reviewed By: datumbox

Differential Revision: D39013658

fbshipit-source-id: 9814c11c44127268b11a55880b9b0c087561deec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants