Skip to content
This repository was archived by the owner on Jul 1, 2025. It is now read-only.

Conversation

nadavrot
Copy link
Contributor

@nadavrot nadavrot commented Oct 4, 2017

This is the first step in implementing a high-level graph that will bridge the caffe2 network and the low-level IR. This sequence of commits implements a basic Graph container and the implementation of some of the basic functionality for the nodes in the graph.

@jspark1105
Copy link
Contributor

Just wanted to check if I understood Graph structure correctly. One way to represent neural network as a graph is having operators as nodes and tensors as edges (actually hyper-edges because a tensor can be consumed by multiple operators) as in TensorFlow. Another way is representing it as a bipartite graph where a node can be an operator or tensor, and an edge is only between an operator and a tensor. Here, the edges don't need to be hyper-edges because there's only one connection between a given operator and a tensor. I think your graph is similar to the latter because an operator is represented as Node and Variable (which represents tensors and other data) is also a Node. Is my understanding correct?

/// A list of variables.
std::vector<Variable *> vars_;

Module &M_;

This comment was marked as off-topic.

/// Represents the compute graph.
class Graph final {
/// A list of nodes that the graph owns.
std::vector<Node *> nodes_;

This comment was marked as off-topic.

This comment was marked as off-topic.

/// A list of nodes that the graph owns.
std::vector<Node *> nodes_;
/// A list of variables.
std::vector<Variable *> vars_;

This comment was marked as off-topic.

for (auto in : inputs) {
(void)in;
assert(in->dims() == inDim && "Invalid input shape");
}

This comment was marked as off-topic.

This comment was marked as off-topic.


for (auto n : nodes_) {
std::cout << n->getDebugDesc() << "\n";
}

This comment was marked as off-topic.

This comment was marked as off-topic.

: Kinded(k), Typed(Ty) {
setName(name);
}

/// \returns a textual description of the node.
virtual std::string getDebugDesc() const;

This comment was marked as off-topic.

db.addParam("output", *getType());
db.addParam("init", WeightVar::getInitKindStr(initKind_));
db.addParam("val", val_);
return db;

This comment was marked as off-topic.

@nadavrot
Copy link
Contributor Author

nadavrot commented Oct 4, 2017

@jspark1105 I'd like to go in a direction where the high-level graph represents something that's similar to TensorFlow's representation. Where edges represent tensors that go between operators. Obviously we need variables as place holders for the information that flows into the graph. This graph will allow transformations like re-ordering relu and poll, and canceling double transpose.

I want to go in the direction of the graph in the picture below. The disadvantage in this representation is that we don't control the memory layout of tensors. So, our low-level IR represents things in the second representation that you mentioned, as a bipartite graph. This allows us to perform the low-level optimizations of overlapping memory regions.

So, having two representations, a high-level one for performing the high-level transformations, and a low-level one for performing the memory and low level transformations will allow us to optimize everything. So, I your comment is excellent and should go in the direction of having both representations (high-level and low-level).

screen shot 2017-08-28 at 4 50 34 pm

@jspark1105
Copy link
Contributor

Thanks for the explanation. Makes perfect sense. Learning a lot by looking at your code!

@nadavrot nadavrot merged commit 81e3b66 into pytorch:master Oct 4, 2017
@nadavrot nadavrot deleted the start_graph_work branch October 12, 2017 22:30
facebook-github-bot pushed a commit that referenced this pull request Jun 15, 2019
Summary:
**Description**
This commit fixes two bugs in the OpenCL implementation of
`BatchedReduceAddInst` and adds a few comments for clarity.

The first is a segmentation fault caused by
incorporating feedback on #2958. A suggestion was made to make the loop
variable `i` in the loop that computes `batchSliceSizes` count down instead of
count up, but this suggestion was taken without changing the type (which was `size_t`,
an unsigned type), so the loop never terminates and eventually leads to a
segmentation fault.

The second bug is an incorrect computation of `destSliceSizes`. Instead of
multiplying the slice size at a dimension with the number of elements in
that same dimension, the code was multiplying the former with the number
of elements in the *adjacent* dimension. This was surfaced by the unit
test added in #2958 for `axis = 2`.

**Test Plan**
1) `ninja check` with OpenCL enabled, DEBUG mode

```
      Start  1: BackendCorrectnessTest
 1/34 Test  #1: BackendCorrectnessTest ..............   Passed   21.28 sec
      Start  2: BackendTest
 2/34 Test  #2: BackendTest .........................   Passed    1.97 sec
      Start  3: BasicIRTest
 3/34 Test  #3: BasicIRTest .........................   Passed    0.05 sec
      Start  4: Caffe2ImporterTest
 4/34 Test  #4: Caffe2ImporterTest ..................   Passed    3.00 sec
      Start  5: DeviceManagerTest
 5/34 Test  #5: DeviceManagerTest ...................   Passed    0.76 sec
      Start  6: ThreadPoolExecutorTest
 6/34 Test  #6: ThreadPoolExecutorTest ..............   Passed    1.48 sec
      Start  7: Float16Test
 7/34 Test  #7: Float16Test .........................   Passed    0.01 sec
      Start  8: GemmTest
 8/34 Test  #8: GemmTest ............................   Passed    0.05 sec
      Start  9: GlowOnnxifiManagerTest
 9/34 Test  #9: GlowOnnxifiManagerTest ..............   Passed    0.06 sec
      Start 10: GradCheckTest
10/34 Test #10: GradCheckTest .......................   Passed    4.72 sec
      Start 11: GraphGradTest
11/34 Test #11: GraphGradTest .......................   Passed    0.06 sec
      Start 12: GraphOptzTest
12/34 Test #12: GraphOptzTest .......................   Passed    0.03 sec
      Start 13: GraphSchedulerTest
13/34 Test #13: GraphSchedulerTest ..................   Passed    0.01 sec
      Start 14: GraphTest
14/34 Test #14: GraphTest ...........................   Passed    1.03 sec
      Start 15: HostManagerTest
15/34 Test #15: HostManagerTest .....................   Passed    7.49 sec
      Start 16: HyphenTest
16/34 Test #16: HyphenTest ..........................   Passed    1.17 sec
      Start 17: IROptTest
17/34 Test #17: IROptTest ...........................   Passed    0.01 sec
      Start 18: ImageTest
18/34 Test #18: ImageTest ...........................   Passed    0.31 sec
      Start 19: LLVMIRGenTest
19/34 Test #19: LLVMIRGenTest .......................   Passed    0.01 sec
      Start 20: MLTest
20/34 Test #20: MLTest ..............................   Passed   46.30 sec
      Start 21: MemoryAllocatorTest
21/34 Test #21: MemoryAllocatorTest .................   Passed    0.03 sec
      Start 22: OCLTest
22/34 Test #22: OCLTest .............................   Passed    0.24 sec
      Start 23: OnnxImporterTest
23/34 Test #23: OnnxImporterTest ....................   Passed    0.12 sec
      Start 24: OperatorGradTest
24/34 Test #24: OperatorGradTest ....................   Passed    0.05 sec
      Start 25: OperatorTest
25/34 Test #25: OperatorTest ........................   Passed   14.47 sec
      Start 26: PartitionerTest
26/34 Test #26: PartitionerTest .....................   Passed    0.05 sec
      Start 28: ProvisionerTest
27/34 Test #28: ProvisionerTest .....................   Passed    1.00 sec
      Start 29: QuantizationTest
28/34 Test #29: QuantizationTest ....................   Passed    7.46 sec
      Start 30: TensorsTest
29/34 Test #30: TensorsTest .........................   Passed    0.36 sec
      Start 31: TensorPoolTest
30/34 Test #31: TensorPoolTest ......................   Passed    0.01 sec
      Start 32: ThreadPoolTest
31/34 Test #32: ThreadPoolTest ......................   Passed    0.01 sec
      Start 33: TraceEventsTest
32/34 Test #33: TraceEventsTest .....................   Passed   10.62 sec
      Start 34: TypeAToTypeBFunctionConverterTest
33/34 Test #34: TypeAToTypeBFunctionConverterTest ...   Passed    0.06 sec
      Start 35: UtilsTest
34/34 Test #35: UtilsTest ...........................   Passed    0.02 sec

100% tests passed, 0 tests failed out of 34

Total Test time (real) = 124.33 sec
```

2) `ninja check` with OpenCL enabled, RELEASE mode
```
      Start  1: BackendCorrectnessTest
 1/34 Test  #1: BackendCorrectnessTest ..............   Passed   11.51 sec
      Start  2: BackendTest
 2/34 Test  #2: BackendTest .........................   Passed    1.53 sec
      Start  3: BasicIRTest
 3/34 Test  #3: BasicIRTest .........................   Passed    0.02 sec
      Start  4: Caffe2ImporterTest
 4/34 Test  #4: Caffe2ImporterTest ..................   Passed    0.62 sec
      Start  5: DeviceManagerTest
 5/34 Test  #5: DeviceManagerTest ...................   Passed    0.83 sec
      Start  6: ThreadPoolExecutorTest
 6/34 Test  #6: ThreadPoolExecutorTest ..............   Passed    0.71 sec
      Start  7: Float16Test
 7/34 Test  #7: Float16Test .........................   Passed    0.01 sec
      Start  8: GemmTest
 8/34 Test  #8: GemmTest ............................   Passed    0.31 sec
      Start  9: GlowOnnxifiManagerTest
 9/34 Test  #9: GlowOnnxifiManagerTest ..............   Passed    0.33 sec
      Start 10: GradCheckTest
10/34 Test #10: GradCheckTest .......................   Passed    1.90 sec
      Start 11: GraphGradTest
11/34 Test #11: GraphGradTest .......................   Passed    0.32 sec
      Start 12: GraphOptzTest
12/34 Test #12: GraphOptzTest .......................   Passed    0.03 sec
      Start 13: GraphSchedulerTest
13/34 Test #13: GraphSchedulerTest ..................   Passed    0.02 sec
      Start 14: GraphTest
14/34 Test #14: GraphTest ...........................   Passed    0.59 sec
      Start 15: HostManagerTest
15/34 Test #15: HostManagerTest .....................   Passed   10.61 sec
      Start 16: HyphenTest
16/34 Test #16: HyphenTest ..........................   Passed    4.18 sec
      Start 17: IROptTest
17/34 Test #17: IROptTest ...........................   Passed    0.04 sec
      Start 18: ImageTest
18/34 Test #18: ImageTest ...........................   Passed    0.10 sec
      Start 19: LLVMIRGenTest
19/34 Test #19: LLVMIRGenTest .......................   Passed    0.71 sec
      Start 20: MLTest
20/34 Test #20: MLTest ..............................   Passed   52.44 sec
      Start 21: MemoryAllocatorTest
21/34 Test #21: MemoryAllocatorTest .................   Passed    0.03 sec
      Start 22: OCLTest
22/34 Test #22: OCLTest .............................   Passed    0.96 sec
      Start 23: OnnxImporterTest
23/34 Test #23: OnnxImporterTest ....................   Passed    0.89 sec
      Start 24: OperatorGradTest
24/34 Test #24: OperatorGradTest ....................   Passed    0.76 sec
      Start 25: OperatorTest
25/34 Test #25: OperatorTest ........................   Passed   33.00 sec
      Start 26: PartitionerTest
26/34 Test #26: PartitionerTest .....................   Passed    0.79 sec
      Start 28: ProvisionerTest
27/34 Test #28: ProvisionerTest .....................   Passed    3.00 sec
      Start 29: QuantizationTest
28/34 Test #29: QuantizationTest ....................   Passed   19.64 sec
      Start 30: TensorsTest
29/34 Test #30: TensorsTest .........................   Passed    0.09 sec
      Start 31: TensorPoolTest
30/34 Test #31: TensorPoolTest ......................   Passed    0.04 sec
      Start 32: ThreadPoolTest
31/34 Test #32: ThreadPoolTest ......................   Passed    0.04 sec
      Start 33: TraceEventsTest
32/34 Test #33: TraceEventsTest .....................   Passed   13.18 sec
      Start 34: TypeAToTypeBFunctionConverterTest
33/34 Test #34: TypeAToTypeBFunctionConverterTest ...   Passed    0.87 sec
      Start 35: UtilsTest
34/34 Test #35: UtilsTest ...........................   Passed    0.04 sec

100% tests passed, 0 tests failed out of 34

Total Test time (real) = 160.15 sec
```
3) `ninja check` with OpenCL enabled, ASAN+UBSAN mode
```
      Start  1: BackendCorrectnessTest
 1/34 Test  #1: BackendCorrectnessTest ..............   Passed   65.05 sec
      Start  2: BackendTest
 2/34 Test  #2: BackendTest .........................   Passed    5.42 sec
      Start  3: BasicIRTest
 3/34 Test  #3: BasicIRTest .........................   Passed    0.09 sec
      Start  4: Caffe2ImporterTest
 4/34 Test  #4: Caffe2ImporterTest ..................   Passed   11.51 sec
      Start  5: DeviceManagerTest
 5/34 Test  #5: DeviceManagerTest ...................   Passed    1.93 sec
      Start  6: ThreadPoolExecutorTest
 6/34 Test  #6: ThreadPoolExecutorTest ..............   Passed    5.08 sec
      Start  7: Float16Test
 7/34 Test  #7: Float16Test .........................   Passed    0.03 sec
      Start  8: GemmTest
 8/34 Test  #8: GemmTest ............................   Passed    0.22 sec
      Start  9: GlowOnnxifiManagerTest
 9/34 Test  #9: GlowOnnxifiManagerTest ..............   Passed    0.18 sec
      Start 10: GradCheckTest
10/34 Test #10: GradCheckTest .......................   Passed   15.40 sec
      Start 11: GraphGradTest
11/34 Test #11: GraphGradTest .......................   Passed    0.22 sec
      Start 12: GraphOptzTest
12/34 Test #12: GraphOptzTest .......................   Passed    0.12 sec
      Start 13: GraphSchedulerTest
13/34 Test #13: GraphSchedulerTest ..................   Passed    0.03 sec
      Start 14: GraphTest
14/34 Test #14: GraphTest ...........................   Passed    3.00 sec
      Start 15: HostManagerTest
15/34 Test #15: HostManagerTest .....................   Passed   13.79 sec
      Start 16: HyphenTest
16/34 Test #16: HyphenTest ..........................   Passed    3.47 sec
      Start 17: IROptTest
17/34 Test #17: IROptTest ...........................   Passed    0.05 sec
      Start 18: ImageTest
18/34 Test #18: ImageTest ...........................   Passed    1.08 sec
      Start 19: LLVMIRGenTest
19/34 Test #19: LLVMIRGenTest .......................   Passed    0.05 sec
      Start 20: MLTest
20/34 Test #20: MLTest ..............................   Passed  141.01 sec
      Start 21: MemoryAllocatorTest
21/34 Test #21: MemoryAllocatorTest .................   Passed    0.08 sec
      Start 22: OCLTest
22/34 Test #22: OCLTest .............................   Passed    0.64 sec
      Start 23: OnnxImporterTest
23/34 Test #23: OnnxImporterTest ....................   Passed    0.51 sec
      Start 24: OperatorGradTest
24/34 Test #24: OperatorGradTest ....................   Passed    0.14 sec
      Start 25: OperatorTest
25/34 Test #25: OperatorTest ........................   Passed   35.78 sec
      Start 26: PartitionerTest
26/34 Test #26: PartitionerTest .....................   Passed    0.20 sec
      Start 28: ProvisionerTest
27/34 Test #28: ProvisionerTest .....................   Passed    2.25 sec
      Start 29: QuantizationTest
28/34 Test #29: QuantizationTest ....................   Passed   17.17 sec
      Start 30: TensorsTest
29/34 Test #30: TensorsTest .........................   Passed    1.28 sec
      Start 31: TensorPoolTest
30/34 Test #31: TensorPoolTest ......................   Passed    0.03 sec
      Start 32: ThreadPoolTest
31/34 Test #32: ThreadPoolTest ......................   Passed    0.05 sec
      Start 33: TraceEventsTest
32/34 Test #33: TraceEventsTest .....................   Passed   32.11 sec
      Start 34: TypeAToTypeBFunctionConverterTest
33/34 Test #34: TypeAToTypeBFunctionConverterTest ...   Passed    0.15 sec
      Start 35: UtilsTest
34/34 Test #35: UtilsTest ...........................   Passed    0.07 sec

100% tests passed, 0 tests failed out of 34

Total Test time (real) = 358.24 sec
```
Pull Request resolved: #3118

Differential Revision: D15836207

Pulled By: SplitInfinity

fbshipit-source-id: 7bfa3c6ed5583d6a8f42b1f712f359e8e1d10b47
khabinov added a commit to khabinov/glow that referenced this pull request May 27, 2021
…038cad

Summary:
Previous import was bd6feb6d0d3fc903df42b4feb82a602a5fcb1fd5

Included changes:
- **[c278588](houseroad/foxi@c278588)**: [foxi] Add a new function for getting current batch size (pytorch#23) <Oleg Khabinov>

Differential Revision: D28731737

fbshipit-source-id: 42dc7b251862cf0b6d5d2bfdb4bd3b4b272348a5
khabinov added a commit to khabinov/glow that referenced this pull request May 27, 2021
…038cad

Summary:
Previous import was bd6feb6d0d3fc903df42b4feb82a602a5fcb1fd5

Included changes:
- **[c278588](houseroad/foxi@c278588)**: [foxi] Add a new function for getting current batch size (pytorch#23) <Oleg Khabinov>

Differential Revision: D28731737

fbshipit-source-id: 051e4ec988228c786ade3b15b9bb2d1a9d549c25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants