Skip to content

multi gpu training with different subprocesses #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
glample opened this issue Nov 22, 2016 · 1 comment
Open

multi gpu training with different subprocesses #13

glample opened this issue Nov 22, 2016 · 1 comment

Comments

@glample
Copy link

glample commented Nov 22, 2016

Hello, I was wondering whether it would be possible to have a small example of code where a same network is cloned on different GPUs, with all clones sharing the same parameters.

For instance, I would like something where different subprocesses can train the model separately (like 8 subprocesses, each responsible for training a model on one GPU). The updates could then be accumulated to a common network, and all GPU network clones could synchronize their parameters to the ones of the common network periodically, or something like this.

@catalystfrank
Copy link

Guess you mean "Data Parallelization".

Like Line 72 examples/imagenet/main.py, please explicitly use:

model = torch.nn.DataParallel(model).cuda()

Once you use DataParallel(model) as model, you can run your command as

CUDA_VISIBLE_DEVICES=4,5,6,7 python main.py [options]

to use last 4 GPU out of total 8 cards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants