Add support for rand_like op in fusion compiler #9795

runtian-zhou · 2018-07-24T23:45:32Z

Enabled support for generating random numbers in fusion compiler. Currently a philox RNG implemented by Tensorflow is used, as the NVRTC couldn't resolve the curand.h header correctly. The two implementation should have exact same behavior according to our tests.

ezyang · 2018-07-24T23:57:56Z

@pytorchbot retest this please

ezyang · 2018-07-25T14:16:22Z

needs rebasing, sorry

torch/csrc/jit/fusion_compiler.cpp

@@ -80,12 +84,297 @@ struct TensorInfo {
  IndexType strides[N];
 };
 )");
+constexpr auto rand_support_literal = R"(


torch/csrc/jit/fusion_compiler.cpp

 auto cuda_compilation_unit_template = CodeTemplate(R"(
 ${type_declarations}

 extern "C" __global__
-void ${kernelName}(IndexType totalElements, ${formals}) {
+void ${kernelName}(IndexType totalElements, ${formals} ${RandParam}) {


torch/csrc/jit/fusion_compiler.cpp

+  // well.
+  if(has_random) {
+    auto gen_ = THCRandom_getGenerator(at::globalContext().getTHCState());
+    uint64_t offset = gen_->state.philox_seed_offset.fetch_add(20);


torch/csrc/jit/fusion_compiler.cpp

    compilation_unit = cu.str();
    nvrtcProgram program;
    TORCH_NVRTC_CHECK(nvrtcCreateProgram(&program, compilation_unit.c_str(), NULL, 0, nullptr, nullptr));

    std::string compute = "--gpu-architecture=compute_" + std::to_string(prop.major) + std::to_string(prop.minor);
-    std::vector<const char *> args = {"--std=c++11", compute.c_str()};
+    std::vector<const char *> args = {"--std=c++11", compute.c_str(), "-default-device"};


torch/csrc/jit/fusion_compiler.cpp

@@ -310,9 +600,11 @@ std::string encodeRHS(Node * n) {
 }

 std::vector<ConcatDesc> emitCompilationUnit(std::ostream & out,
+                                            bool& has_random,


ezyang

This is a really tight PR, I like it a lot! Good to go after the nits are fixed.

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

runtian-zhou · 2018-07-26T05:12:34Z

@pytorchbot retest this please

zdevito

Structure looks good. I have a few questions about how random number seeds are handled. For instance, if we have two calls to rand inside a kernel, it doesn't look like we change the amount we increment the seed accordingly. Can you explain how this works?

torch/csrc/jit/fusion_compiler.cpp

+      if(i == 0) buf = philox();
+      uint32 ret = buf[i];
+      i = (i + 1) % 4;
+      static uint32 FLOAT_MASK = (1 << 24) - 1;


torch/csrc/jit/fusion_compiler.cpp

+    // well.
+    if(has_random && this->backend() == at::kCUDA) {
+      auto gen_ = THCRandom_getGenerator(at::globalContext().getTHCState());
+      uint64_t offset = gen_->state.philox_seed_offset.fetch_add(20);


torch/csrc/jit/fusion_compiler.cpp

@@ -321,9 +619,12 @@ std::string encodeRHS(Node * n) {
 }

 std::vector<ConcatDesc> emitCompilationUnit(std::ostream & out,
+                                            bool* has_random,


torch/csrc/jit/fusion_compiler.cpp

+    }
+    PHILOX_DEVICE_INLINE float operator()() {
+      if(i == 0) buf = philox();
+      uint32 ret = buf[i];


torch/csrc/jit/fusion_compiler.cpp

+      if(i == 0) buf = philox();
+      uint32 ret = buf[i];
+      i = (i + 1) % 4;
+      const uint32 FLOAT_MASK = (1 << 24) - 1;


runtian-zhou · 2018-07-31T04:13:27Z

@pytorchbot retest this please

torch/csrc/jit/fusion_compiler.cpp

@@ -88,11 +94,116 @@ struct TensorInfo {
 };
 )");

+// The reason why we used TensorFlow's philox implementation is that currently


torch/csrc/jit/fusion_compiler.cpp

+  };
+
+  // Constants are picked from https://www.doornik.com/research/randomdouble.pdf
+  #define M_RAN_INVM32 2.32830643653869628906e-010


torch/csrc/jit/fusion_compiler.cpp

+      counter.x += nlo;
+      if (counter.x < nlo)
+        nhi++;
+      counter.y += nhi;


ngimel · 2018-07-31T20:01:56Z

This looks better now, I have a more general question - do we want philox generator as a string literal, or do we want it as a header that's redistributed with binaries and thus can be supplied to nvrtc as a header, not as part of a source file? Pros of going with the header is that when THC generation is moved to use philox too (to match jit) it can reuse this header, otherwise THC will either have to use curand or have a copy-paste of what's in this PR, both of those options are not great IMO.

runtian-zhou · 2018-07-31T20:36:43Z

I wasn't quite sure how we shall do the redistribute the header. I think NVRTC either need the path to the header file or the actual string literal rather than using just the include directive.

ngimel · 2018-07-31T21:10:48Z

I think it does need path to the header file, but pytorch should be able to provide that at runtime if it's part of pytorch binary install?

ezyang · 2018-08-01T14:35:49Z

I agree it's "better" for the header to be in an actual file, but to do this we have to solve some redistribution problems, where the JIT code doesn't "know" where we installed ATen/TH headers, so what file should it pass to the compiler? (You can hardcode the filepath into the binary, but congratulations, your binary is no longer relocatable). It's just generally easier to make things work if you have the string in the binary.

This isn't a fatal problem; for example, you can get the header location from Python and pass it in (like how torch/utils/cpp_extension.py does it). It's just more work. @zrt95, it's up to you; if you want to merge the PR as is to start with I'm OK with that.

runtian-zhou · 2018-08-01T17:52:54Z

I think I'd prefer merge it as is.

ezyang · 2018-08-01T18:02:39Z

you have my blessing

ssnl · 2018-08-01T18:58:56Z

@pytorchbot retest this please

mcarilli · 2018-08-01T20:20:04Z

LGTM. I copied the new Philox implementation and benchmarked it on my Titan V (roughly equivalent to V100). Performance is decent and I'm not observing any pesky local memory use. The values produced match Curand for the same seed, sequence numbers, and offsets.

facebook-github-bot

SsnL is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Enabled support for generating random numbers in fusion compiler. Currently a philox RNG implemented by Tensorflow is used, as the NVRTC couldn't resolve the curand.h header correctly. The two implementation should have exact same behavior according to our tests. Pull Request resolved: pytorch#9795 Differential Revision: D8999029 Pulled By: SsnL fbshipit-source-id: f0d2616a699a942e2f370bdb02ac77b9c463d7b8

runtian-zhou requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners July 24, 2018 23:45

ezyang reviewed Jul 25, 2018

View reviewed changes

torch/csrc/jit/fusion_compiler.cpp Outdated

@@ -80,12 +84,297 @@ struct TensorInfo {

IndexType strides[N];

};

)");

constexpr auto rand_support_literal = R"(

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Jul 25, 2018

View reviewed changes

torch/csrc/jit/fusion_compiler.cpp Outdated

@@ -310,9 +600,11 @@ std::string encodeRHS(Node * n) {

}

std::vector<ConcatDesc> emitCompilationUnit(std::ostream & out,

bool& has_random,

This comment was marked as off-topic.

Sign in to view

ezyang approved these changes Jul 25, 2018

View reviewed changes

runtian-zhou force-pushed the fusion_rand branch from cacc6a4 to a681197 Compare July 25, 2018 18:41

facebook-github-bot reviewed Jul 25, 2018

View reviewed changes

runtian-zhou force-pushed the fusion_rand branch from 1db3ae4 to dae9db0 Compare July 26, 2018 00:09

zdevito reviewed Jul 26, 2018

View reviewed changes

mcarilli reviewed Jul 26, 2018

View reviewed changes

torch/csrc/jit/fusion_compiler.cpp Outdated

}

PHILOX_DEVICE_INLINE float operator()() {

if(i == 0) buf = philox();

uint32 ret = buf[i];

This comment was marked as off-topic.

Sign in to view

ngimel reviewed Jul 27, 2018

View reviewed changes

runtian-zhou force-pushed the fusion_rand branch from 005153c to 32b5a1e Compare July 31, 2018 00:39

ngimel reviewed Jul 31, 2018

View reviewed changes

Runtian Zhou added 6 commits July 31, 2018 12:28

first impl, haven't tested yet

c3dab29

Adding support for rand_like op in fusion compiler

c22f312

Fix style issue

37c2130

Remove unused variable

7f247df

Enable rand only at CUDA mode

b07f502

Return the result in tuple

f257414

Refactor code for lisence issue

a093a8e

ngimel reviewed Jul 31, 2018

View reviewed changes

torch/csrc/jit/fusion_compiler.cpp

counter.x += nlo;

if (counter.x < nlo)

nhi++;

counter.y += nhi;

This comment was marked as off-topic.

Sign in to view

runtian-zhou force-pushed the fusion_rand branch from 32b5a1e to 73ae53b Compare July 31, 2018 20:34

runtian-zhou force-pushed the fusion_rand branch from 73ae53b to 981a1e5 Compare July 31, 2018 20:51

Fix comment

d802d7a

runtian-zhou force-pushed the fusion_rand branch from 981a1e5 to d802d7a Compare July 31, 2018 21:04

facebook-github-bot reviewed Aug 2, 2018

View reviewed changes

facebook-github-bot closed this in 70d47f9 Aug 2, 2018

ezyang added open source merged labels Jun 24, 2019

Add support for rand_like op in fusion compiler #9795

Add support for rand_like op in fusion compiler #9795

Uh oh!

Conversation

runtian-zhou commented Jul 24, 2018

Uh oh!

ezyang commented Jul 24, 2018

Uh oh!

ezyang commented Jul 25, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

runtian-zhou commented Jul 26, 2018

Uh oh!

zdevito left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

runtian-zhou commented Jul 31, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ngimel commented Jul 31, 2018

Uh oh!

runtian-zhou commented Jul 31, 2018

Uh oh!

ngimel commented Jul 31, 2018

Uh oh!

ezyang commented Aug 1, 2018

Uh oh!

runtian-zhou commented Aug 1, 2018

Uh oh!

ezyang commented Aug 1, 2018

Uh oh!

ssnl commented Aug 1, 2018

mcarilli commented Aug 1, 2018 •

edited

Loading