Skip to content

4569 consistent rng for data loader #5067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 2, 2022
Merged

Conversation

wyli
Copy link
Contributor

@wyli wyli commented Sep 1, 2022

Signed-off-by: Wenqi Li [email protected]

Fixes #4569

Description

with different num_workers the batch sampler's rng should be consistent, this PR keeps the initial random seed to achieve it.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change). # the previous results for num_wokers==0 might change
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

wyli added 2 commits September 2, 2022 00:23
Signed-off-by: Wenqi Li <[email protected]>
Signed-off-by: Wenqi Li <[email protected]>
@wyli wyli marked this pull request as ready for review September 2, 2022 07:17
Copy link
Contributor

@Nic-Ma Nic-Ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, looks good to me.

@wyli wyli enabled auto-merge (squash) September 2, 2022 08:54
@wyli
Copy link
Contributor Author

wyli commented Sep 2, 2022

/build

@wyli wyli merged commit 52aed2b into Project-MONAI:dev Sep 2, 2022
@wyli wyli deleted the 4569-loading-rng branch September 7, 2022 12:14
@atbenmurray
Copy link
Contributor

atbenmurray commented Mar 8, 2023

@wyli, this breaks any scenarios where the user has set up their RNG on their pipeline ahead of time. I understand the need for such a mechanism but I suggest reassessing it and looking at ways of making it opt-in

@wyli
Copy link
Contributor Author

wyli commented Mar 8, 2023

@wyli, this breaks any scenarios where the user has set up their RNG on their pipeline ahead of time. I understand the need for such a mechanism but I suggest reassessing it and looking at ways of making it opt-in

it's possible to use a predefined RNG as a keyword argument generator=_g:

_g = torch.Generator()
_g.manual_seed(0)
set_determinism(0)
train_loader = monai.data.DataLoader(
train_ds, batch_size=1, shuffle=True, num_workers=num_workers, generator=_g, persistent_workers=num_workers > 0

@atbenmurray
Copy link
Contributor

atbenmurray commented Mar 8, 2023

@wyli, this breaks any scenarios where the user has set up their RNG on their pipeline ahead of time. I understand the need for such a mechanism but I suggest reassessing it and looking at ways of making it opt-in

it's possible to use a predefined RNG as a keyword argument generator=_g:

_g = torch.Generator()
_g.manual_seed(0)
set_determinism(0)
train_loader = monai.data.DataLoader(
train_ds, batch_size=1, shuffle=True, num_workers=num_workers, generator=_g, persistent_workers=num_workers > 0

That only works for scenarios where the RNG hasn't already been set up. I definitely think that there needs to be an option where it just doesn't alter the pipeline random state. In my case, I have added a suppress_rng flag just to get around the problem for the LR paper; it might be a good way to go if we make it a little bit semantically clearer what this flag is for, but is there a reason why the whole mechanism isn't opt-in?
Looking at the original comment on the PR, it seems like this could be more of a point fix for the batchsampler itself, no? It is the batchsampler rng that must be consistent across workers, yes?

@wyli wyli mentioned this pull request Aug 24, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ThreadDataLoader dataloading order is related to numworker, which makes it difficult to reproduce
3 participants