Skip to content

* Replace np.random by random #354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 2, 2018
Merged

Conversation

vfdev-5
Copy link
Collaborator

@vfdev-5 vfdev-5 commented Dec 2, 2017

To be able to fix random state with a single command:

random.seed(12345)

instead of

np.random.seed(12345)
random.seed(12345)

@alykhantejani
Copy link
Contributor

this would now cause breaking changes in users code if they were previously setting np.random.seed to fix the seed in the transforms...

Perhaps a better solution would be to pass in a random state to the functions, and if it is None it uses the current random state (got by np.random.get_state).

Thoughts @fmassa ?

@ducha-aiki
Copy link
Contributor

What about linking seed to actually torch.manual_seed ?

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Feb 14, 2018

@ducha-aiki in this case all torchvision random stuff is needed to rewrite with torch.random what is much more breaking change than just drop np.random

@fmassa
Copy link
Member

fmassa commented Feb 26, 2018

I think we need to have some uniformity in here, so we might need to go through a breaking change (which is very small though).
I think we might want to pass a seed argument to the random transforms. But just passing the seed is not enough because we don't want to always have the same transforms, nor to always have to manually change the seed.

Maybe we will need to pass a initial random state, which will be updated by the random transforms.
Something like

def random_flip(img, random_state=None):
    if random_state is not None:
        torch.set_rng_state(random_state)
    # perform the random operations using torch
    ...
    # now update the random state
    if random_state is not None:
        random_state.copy_(torch.get_rng_state)
    # and maybe set the torch rng state back to its original value

But that's a bit annoying and might bring some additional overhead and maybe issues on multithreading behaving the same for different threads.
What do you think?

@fmassa fmassa merged commit a7b66f0 into pytorch:master Mar 2, 2018
@fmassa
Copy link
Member

fmassa commented Mar 2, 2018

I think this should be unified. Thanks!

@vfdev-5 vfdev-5 deleted the remove_np_random branch March 2, 2018 14:03
@laoreja
Copy link

laoreja commented Jul 6, 2018

I use np.random in my dataset class, and noticed the issue on multithreading behaving the same for different threads.
My fix is adding worker_init_fn=lambda x: np.random.seed((torch.initial_seed() + x) % (2 ** 32)) to the DataLoader initialization.

If I also use your transform functions that uses the standard random library, should I add a line to the worker_init_fn also, and actually I do not understand what's the problem behind the multithreading, can you give some explanation?

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Jul 6, 2018

@laoreja random and np.random behave differently if multithreading type is "fork".
Take a look at this snippet:

import multiprocessing as mp

import random
import numpy as np

def task():
    print("-- Task --")
    print(random.random(), np.random.rand())
    print(random.random(), np.random.rand())
    print("-- End Task --")
    
random.seed(12345)
np.random.seed(12345)

workers = [mp.Process(target=task, args=()) for i in range(4)]

print("Run")
for w in workers:
    w.start()

and the output

Run
-- Task --
0.6062518296570525 0.9296160928171479
0.7142844140096419 0.3163755545817859
-- End Task --
-- Task --
0.6959326394013083 0.9296160928171479
0.31228532647566887 0.3163755545817859
-- End Task --
-- Task --
0.4936887984487045 0.9296160928171479
0.7509582156538067 0.3163755545817859
-- Task --
-- End Task --
0.6168844031764428 0.9296160928171479
0.5782864572237567 0.3163755545817859
-- End Task --

So, you do not need to add new seed for the workers if use random.

HTH

@laoreja
Copy link

laoreja commented Jul 7, 2018

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants