Skip to content

[C++ API] Functional DataParallel #9234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

goldsborough
Copy link
Contributor

@goldsborough goldsborough commented Jul 7, 2018

This PR adds the functional version of DataParallel (i.e. data_parallel) to the C++ frontend.

For this, I had to:

  1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under torch/csrc/autograd/functions/comm.{h,cpp}. I had to move some utilities from VariableType.cpp into torch/csrc/autograd/functions/utils.h, and changed them a bit to fix the const_casts for which there were TODOs,
  2. Implement the replicate, parallel_apply and the combining data_parallel functions in C++.

replicate is implemented based on our existing clone() interface, along with the ability to set the current device via at::OptionsGuard (so nice).

parallel_apply is implemented using at::parallel_for (CC @cpuhrsch) and follows the code from PyTorch.

Added lots of tests for these things.

@apaszke @ezyang @ebetica @colesbury

@ezyang
Copy link
Contributor

ezyang commented Jul 9, 2018

Build failure looks legit.

@goldsborough goldsborough force-pushed the data-parallel branch 5 times, most recently from 4579c1d to dc894fb Compare July 9, 2018 20:46
@goldsborough
Copy link
Contributor Author

I'm trying to use the new CUDAStream interface we just got in ATen, but this is causing segfaults. Investigating.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. I have some comments that might help clean up the code. Would be good to fix the std::terminate in case of an exception before merging.

for (auto& variable : variables) {
set_history(variable, grad_fn);
}
}

This comment was marked as off-topic.

This comment was marked as off-topic.

"and return a vector.");
}

std::vector<at::Tensor> tensors;

This comment was marked as off-topic.

}
grad_fn = std::make_shared<Scatter>(
source_devices,
input_sizes,

This comment was marked as off-topic.

return {variable};
#else
AT_ERROR("Gather is only supported in CUDA environments");
#endif

This comment was marked as off-topic.

});
std::vector<at::Tensor> tensors;
tensors =
torch::cuda::scatter(input, device_indices, chunk_sizes_, dim_, streams_);

This comment was marked as off-topic.

This comment was marked as off-topic.


auto a = torch::ones(5, torch::requires_grad(true).device({torch::kCUDA, 0}));
auto b = torch::ones(5, torch::requires_grad(true).device({torch::kCUDA, 1}));
auto output = gather.apply({a, b});

This comment was marked as off-topic.


REQUIRE(b.grad().defined());
REQUIRE(b.grad().device() == torch::Device(torch::kCUDA, 1));
REQUIRE(b.grad().sum().toCInt() == 5);

This comment was marked as off-topic.

TEST_CASE("Parallel/Replicate", "[cuda]") {
Linear linear(3, 4);
auto replicas = parallel::replicate(
linear, {torch::Device(torch::kCUDA, 0), torch::Device(torch::kCUDA, 1)});

This comment was marked as off-topic.

This comment was marked as off-topic.

replica2_parameters[i]->data().data<float>() !=
original_parameters[i]->data().data<float>());
}
}

This comment was marked as off-topic.

This comment was marked as off-topic.

linear,
input,
/*devices=*/at::nullopt,
/*output_device=*/torch::Device(torch::kCUDA, 1));

This comment was marked as off-topic.

@goldsborough goldsborough force-pushed the data-parallel branch 2 times, most recently from 7a255ed to 1f6e95d Compare July 13, 2018 17:57
@goldsborough
Copy link
Contributor Author

@apaszke I think I addressed all your comments now. Thanks for the review!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldsborough has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldsborough has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Throw runtime error in non-CUDA environments

Remove THCStream forward declarations

Add retain() before converting from THCStream to CUDAStream

Conditionally compile with OpenMP in libtorch

Improve move-efficiency of comm.cpp and add multi-gpu guard

Fix single-device case of data_parallel

Include functional.h in python_comm.cpp

Rethrow exception in parallel_apply

Clarify data-parallel documentation
@goldsborough
Copy link
Contributor Author

@pytorchbot retest this please

@goldsborough
Copy link
Contributor Author

@pytorchbot retest this please

1 similar comment
@goldsborough
Copy link
Contributor Author

@pytorchbot retest this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldsborough is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jramseyer pushed a commit to jramseyer/pytorch that referenced this pull request Jul 30, 2018
Summary:
This PR adds the functional version of `DataParallel` (i.e. `data_parallel`) to the C++ frontend.

For this, I had to:
1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under `torch/csrc/autograd/functions/comm.{h,cpp}`. I had to move some utilities from `VariableType.cpp` into `torch/csrc/autograd/functions/utils.h`, and changed them a bit to fix the `const_cast`s for which there were `TODO`s,
2. Implement the `replicate`, `parallel_apply` and the combining `data_parallel` functions in C++.

`replicate` is implemented based on our existing `clone()` interface, along with the ability to set the current device via `at::OptionsGuard` (so nice).

`parallel_apply` is implemented using `at::parallel_for` (CC cpuhrsch) and [follows the code from PyTorch](https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py).

Added lots of tests for these things.

apaszke ezyang ebetica colesbury
Pull Request resolved: pytorch#9234

Differential Revision: D8865182

Pulled By: goldsborough

fbshipit-source-id: 4f1fecf2b3f3bc1540c071dfb2d23dd45de433e4
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
This PR adds the functional version of `DataParallel` (i.e. `data_parallel`) to the C++ frontend.

For this, I had to:
1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under `torch/csrc/autograd/functions/comm.{h,cpp}`. I had to move some utilities from `VariableType.cpp` into `torch/csrc/autograd/functions/utils.h`, and changed them a bit to fix the `const_cast`s for which there were `TODO`s,
2. Implement the `replicate`, `parallel_apply` and the combining `data_parallel` functions in C++.

`replicate` is implemented based on our existing `clone()` interface, along with the ability to set the current device via `at::OptionsGuard` (so nice).

`parallel_apply` is implemented using `at::parallel_for` (CC cpuhrsch) and [follows the code from PyTorch](https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py).

Added lots of tests for these things.

apaszke ezyang ebetica colesbury
Pull Request resolved: pytorch#9234

Differential Revision: D8865182

Pulled By: goldsborough

fbshipit-source-id: 4f1fecf2b3f3bc1540c071dfb2d23dd45de433e4
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants