Implement c10 ops needed for benchmark #9360

smessmer · 2018-07-11T21:30:28Z

Summary:
This implements a first set of c10 operators, namely the ones needed for the multithread predictor benchmark.

All implementations are CPU-only and experimental. They're not meant to be used in production.

They can be used, however, to test calling simple c10 MLPs from Caffe2 or PyTorch when working on these integration paths.

Differential Revision: D8811698

caffe2/core/dispatch/DispatchTable.h

@@ -39,6 +39,7 @@ class ThreadsafeOperatorTable_ final {
      auto result = map_.emplace(std::forward<Key>(key), value);
      if (!result.second) {
        std::ostringstream msg;
+        using ::operator<<;


caffe2/utils/SmallVector.h

@@ -0,0 +1,968 @@
+//===- llvm/ADT/SmallVector.h - 'Normally small' vectors --------*- C++ -*-===//


caffe2/utils/ArrayRef.h

@@ -0,0 +1,201 @@
+//===--- ArrayRef.h - Array Reference Wrapper -------------------*- C++ -*-===//


caffe2/utils/AlignOf.h

@@ -0,0 +1,145 @@
+//===--- AlignOf.h - Portable calculation of type alignment -----*- C++ -*-===//


caffe2/operators/experimental/c10/cpu/sigmoid_cross_entropy_with_logits_cpu.cc

+
+namespace caffe2 {
+namespace {
+inline float sigmoid_partition(float lgt) {


ezyang · 2018-07-14T04:25:35Z

Is this PR intended for merging, or just to put it somewhere where people can get to it?

I see a lot of copy pasting and that makes me very nervous. Clearly in the terminal state, we don't want copy pasted copies of code. What is the justification for copy pasting to this degree?

I am also wondering if there is too much boilerplate to be nimble here. For example, if I understand correctly, C10OperatorWrapper is feeding arguments through the dispatcher in unboxed form. It will eventually need to be boxed. Won't this require revising every single operator when this change is made? In this case it is better to have less code so that you can try different design iterations more quickly.

smessmer · 2018-07-19T23:11:36Z

I'll remove the copy pasta.
This is meant for merging. It is a good set of operators covering common use cases like inputs vs. arguments, having no state vs having state, having variable number of inputs, ...
All designs we want to experiment with in the future need to make sure they support these cases, so this is basically a test that they do. Also, this allows future integration work (say C10 into ATen) to be tested with actual operators to call. There will likely have to be many changes to C10OperatorWrapper when we change to boxed, but probably fewer to the kernels themselves.
These operators are btw not meant to be used in any production code, that's why they're in the experimental directory and have such weird operator names. There's for example only CPU kernels currently and even on CPU, they might not support all argument combinations that are supported by caffe2.

Differential Revision: D9095209 fbshipit-source-id: 3bd23462c1b7baf86bc5d88643aefc9b4c21b6c8

Differential Revision: D8926785 fbshipit-source-id: f1f656e3ef20d102c163b083d6d378379e338490

Differential Revision: D8961693 fbshipit-source-id: 9f678e3fed8ff5942a097d64433eb909f6a07b33

Summary: This diff takes ArrayRef.h and moves it to ATen/core, including all its dependencies. Differential Revision: D9074144 fbshipit-source-id: 1069ad11b076296c26dd0b7b0f758fff9f2018b1

Summary: This adds the capability for caffe2 to call c10 operators and adds a dummy c10 sigmoid op as a proof of concept. I used this test script to make sure it works: from caffe2.python import workspace, model_helper import numpy as np data1 = np.random.rand(16, 100).astype(np.float32) workspace.FeedBlob("data1", data1) m = model_helper.ModelHelper(name="my net") sigmoid1 = m.net.C10Sigmoid_DontUseThisOpYet("data1", "sigmoid1") sigmoid2 = m.net.Sigmoid("data1", "sigmoid2") workspace.RunNetOnce(m.param_init_net) workspace.CreateNet(m.net) data1 = np.random.rand(16, 100).astype(np.float32) workspace.FeedBlob("data1", data1) workspace.RunNet(m.name, 1) print(workspace.FetchBlob("data1")) print(workspace.FetchBlob("sigmoid1")) print(workspace.FetchBlob("sigmoid2")) (and check that both sigmoid outputs are the same) Differential Revision: D8814669 fbshipit-source-id: 708de08e2d5d5d8920af2fc8448ab90b0e45b567

Summary: Pull Request resolved: pytorch#9360 This implements a first set of c10 operators, namely the ones needed for the multithread predictor benchmark. All implementations are CPU-only and experimental. They're not meant to be used in production. They can be used, however, to test calling simple c10 MLPs from Caffe2 or PyTorch when working on these integration paths. Differential Revision: D8811698 fbshipit-source-id: 5074df43a08e84a120550dfad57ed8bcc8f660b5

Summary: Pull Request resolved: pytorch#9360 This implements a first set of c10 operators, namely the ones needed for the multithread predictor benchmark. All implementations are CPU-only and experimental. They're not meant to be used in production. They can be used, however, to test calling simple c10 MLPs from Caffe2 or PyTorch when working on these integration paths. Reviewed By: dzhulgakov Differential Revision: D8811698 fbshipit-source-id: 826789c38b2bfdb125a5c0d03c5aebf627785482

smessmer force-pushed the export-D8811698 branch from 7ead30c to 6ac5c07 Compare July 11, 2018 21:39

smessmer force-pushed the export-D8811698 branch from 6ac5c07 to 0cd1fd0 Compare July 11, 2018 23:51