[C++ API] Functional DataParallel #9234

goldsborough · 2018-07-07T06:18:19Z

This PR adds the functional version of DataParallel (i.e. data_parallel) to the C++ frontend.

For this, I had to:

Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under torch/csrc/autograd/functions/comm.{h,cpp}. I had to move some utilities from VariableType.cpp into torch/csrc/autograd/functions/utils.h, and changed them a bit to fix the const_casts for which there were TODOs,
Implement the replicate, parallel_apply and the combining data_parallel functions in C++.

replicate is implemented based on our existing clone() interface, along with the ability to set the current device via at::OptionsGuard (so nice).

parallel_apply is implemented using at::parallel_for (CC @cpuhrsch) and follows the code from PyTorch.

Added lots of tests for these things.

@apaszke @ezyang @ebetica @colesbury

ezyang · 2018-07-09T00:06:11Z

Build failure looks legit.

goldsborough · 2018-07-09T20:49:44Z

I'm trying to use the new CUDAStream interface we just got in ATen, but this is causing segfaults. Investigating.

apaszke

Mostly LGTM. I have some comments that might help clean up the code. Would be good to fix the std::terminate in case of an exception before merging.

torch/csrc/autograd/functions/utils.h

+  for (auto& variable : variables) {
+    set_history(variable, grad_fn);
+  }
+}


torch/csrc/autograd/functions/comm.cpp

+        "and return a vector.");
+  }
+
+  std::vector<at::Tensor> tensors;


torch/csrc/autograd/functions/comm.cpp

+    }
+    grad_fn = std::make_shared<Scatter>(
+        source_devices,
+        input_sizes,


torch/csrc/autograd/functions/comm.cpp

+  return {variable};
+#else
+  AT_ERROR("Gather is only supported in CUDA environments");
+#endif


torch/csrc/autograd/functions/comm.cpp

+  });
+  std::vector<at::Tensor> tensors;
+  tensors =
+      torch::cuda::scatter(input, device_indices, chunk_sizes_, dim_, streams_);


test/cpp/api/parallel.cpp

+
+  auto a = torch::ones(5, torch::requires_grad(true).device({torch::kCUDA, 0}));
+  auto b = torch::ones(5, torch::requires_grad(true).device({torch::kCUDA, 1}));
+  auto output = gather.apply({a, b});


test/cpp/api/parallel.cpp

+
+  REQUIRE(b.grad().defined());
+  REQUIRE(b.grad().device() == torch::Device(torch::kCUDA, 1));
+  REQUIRE(b.grad().sum().toCInt() == 5);


test/cpp/api/parallel.cpp

+TEST_CASE("Parallel/Replicate", "[cuda]") {
+  Linear linear(3, 4);
+  auto replicas = parallel::replicate(
+      linear, {torch::Device(torch::kCUDA, 0), torch::Device(torch::kCUDA, 1)});


test/cpp/api/parallel.cpp

+        replica2_parameters[i]->data().data<float>() !=
+        original_parameters[i]->data().data<float>());
+  }
+}


test/cpp/api/parallel.cpp

+      linear,
+      input,
+      /*devices=*/at::nullopt,
+      /*output_device=*/torch::Device(torch::kCUDA, 1));


goldsborough · 2018-07-13T22:16:10Z

@apaszke I think I addressed all your comments now. Thanks for the review!

facebook-github-bot

@goldsborough has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@goldsborough has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Throw runtime error in non-CUDA environments Remove THCStream forward declarations Add retain() before converting from THCStream to CUDAStream Conditionally compile with OpenMP in libtorch Improve move-efficiency of comm.cpp and add multi-gpu guard Fix single-device case of data_parallel Include functional.h in python_comm.cpp Rethrow exception in parallel_apply Clarify data-parallel documentation

goldsborough · 2018-07-19T00:19:56Z

@pytorchbot retest this please

goldsborough · 2018-07-19T02:33:46Z

@pytorchbot retest this please

goldsborough · 2018-07-19T19:28:49Z

@pytorchbot retest this please

facebook-github-bot

@goldsborough is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This PR adds the functional version of `DataParallel` (i.e. `data_parallel`) to the C++ frontend. For this, I had to: 1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under `torch/csrc/autograd/functions/comm.{h,cpp}`. I had to move some utilities from `VariableType.cpp` into `torch/csrc/autograd/functions/utils.h`, and changed them a bit to fix the `const_cast`s for which there were `TODO`s, 2. Implement the `replicate`, `parallel_apply` and the combining `data_parallel` functions in C++. `replicate` is implemented based on our existing `clone()` interface, along with the ability to set the current device via `at::OptionsGuard` (so nice). `parallel_apply` is implemented using `at::parallel_for` (CC cpuhrsch) and [follows the code from PyTorch](https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py). Added lots of tests for these things. apaszke ezyang ebetica colesbury Pull Request resolved: pytorch#9234 Differential Revision: D8865182 Pulled By: goldsborough fbshipit-source-id: 4f1fecf2b3f3bc1540c071dfb2d23dd45de433e4

goldsborough requested review from apaszke, colesbury, ebetica, ezyang, gchanan, soumith and zdevito as code owners July 7, 2018 06:18

goldsborough force-pushed the data-parallel branch 5 times, most recently from 4579c1d to dc894fb Compare July 9, 2018 20:46

apaszke reviewed Jul 11, 2018

View reviewed changes

goldsborough force-pushed the data-parallel branch 2 times, most recently from 7a255ed to 1f6e95d Compare July 13, 2018 17:57

apaszke approved these changes Jul 16, 2018

View reviewed changes

facebook-github-bot reviewed Jul 16, 2018

View reviewed changes

goldsborough force-pushed the data-parallel branch from fb26da6 to ecdc3cd Compare July 17, 2018 18:01

facebook-github-bot reviewed Jul 17, 2018

View reviewed changes

goldsborough added 2 commits July 18, 2018 15:12

Post rebase fixes

64527df

goldsborough force-pushed the data-parallel branch from ecdc3cd to 64527df Compare July 18, 2018 22:36

Remove unused stream_or_default

bcaacf8

facebook-github-bot reviewed Jul 19, 2018

View reviewed changes

facebook-github-bot closed this in b770156 Jul 19, 2018

ezyang added the merged label Jun 26, 2019

[C++ API] Functional DataParallel #9234

[C++ API] Functional DataParallel #9234

Uh oh!

Conversation

goldsborough commented Jul 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Jul 9, 2018

Uh oh!

goldsborough commented Jul 9, 2018

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

goldsborough commented Jul 13, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

goldsborough commented Jul 19, 2018

Uh oh!

goldsborough commented Jul 19, 2018

Uh oh!

goldsborough commented Jul 19, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

goldsborough commented Jul 7, 2018 •

edited

Loading