Skip to content

Placeholder: add Placeholder as a new kind of input/output to the graph. #1409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 6, 2018

Conversation

nadavrot
Copy link
Contributor

@nadavrot nadavrot commented Aug 6, 2018

This PR is the first few steps in the implementation of the Placeholder node. The plan is discussed in #1334. Placeholder are unbound variables where the content of the tensor is provided by the execution engine or the AOT compiler at runtime.

This PR introduces Storage as a superclass for Variable (bound) and Placeholder (unbound). This PR also adds a simple unit test that checks that we can create new Placeholder nodes. This PR does not implement any of the infrastructure that's required to do anything useful with Placeholder variables, such as loading content into them, deleting them, pretty dotty-printer, etc.

Potentially controversial decisions:

  1. Placeholder and Variable inherit from Storage.
  2. The name Placeholder.
  3. The API for getVar and getPlaceholder instead of getStorage.

Next steps (not necessarily in any specific order):

  1. Teach the execution engine to bind tensors to the Placeholder nodes.
  2. Verify that dotty printing, dump() and debugging work well.
  3. Cleanup the APIs that are related to Variable and Placeholder and make them consistent.
  4. Change (some of) the unit tests to use the new Placeholder API.
  5. Make sure that our optimizations are correct when placeholder are used.

@nadavrot
Copy link
Contributor Author

nadavrot commented Aug 6, 2018

This is related to issue: #1334

@@ -117,9 +118,19 @@ class Module final {

const VariablesList &getVars() const { return vars_; }

/// \returns the list of variables that the Module owns.

This comment was marked as off-topic.

@opti-mix
Copy link
Contributor

opti-mix commented Aug 6, 2018

@nadavrot We have eraseVariable APIs. Do we need to add erasePlaceholder APIs? And oureraseNode API explicitly special-cases Variable. We may need to special-case the Placeholder there as well.

Copy link
Contributor

@jfix71 jfix71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a follow-up PR will replace all applicable uses of createVariable() with createPlaceholder()?

Curious what color Placeholder nodes will be in the dot file, hope they're sufficiently different from Variables.

public:
Storage(Kinded::Kind k, llvm::StringRef name) : Node(k, name) {}

NodeValue getOutput() { return getNthResult(0); }

This comment was marked as off-topic.

@stoklund
Copy link
Contributor

stoklund commented Aug 6, 2018

Could you please provide more text that No description provided? It's not clear what direction you are going with this.

@qcolombet
Copy link
Contributor

Maybe it is already covered by 3 or 4, but I would call out:
6. Teach the differentiation mechanism how to deal with PlaceHolder.

@nadavrot nadavrot force-pushed the add_place_holder_node branch 2 times, most recently from 783d0af to f91d1d3 Compare August 8, 2018 16:45
@nadavrot
Copy link
Contributor Author

nadavrot commented Aug 8, 2018

@jfix71 At the moment placeholder variables are rendered as regular nodes. I like your suggestion to add a unique color.

screen shot 2018-08-08 at 9 45 25 am

@stoklund
Copy link
Contributor

stoklund commented Aug 8, 2018

In my opinion, this direction is not a good solution to the problems raised in #1334.

TL;DR:

  • The muddy runtime semantics of Variable remain.
  • We need a clear concept of constants.
  • The testing strategy informing this direction is dangerous.

Runtime semantics

It is important to have a compiler IR with clear concepts like runtime data versus compile time data. The idea of a mutable bound variable that lives in the compiler IR and that can be mutated by compiled code is very confusing. Handwaving this away as "only used for testing" is not a good idea.

Constants

Constant optimizations are essential to getting good performance, and the design presented here does not make it clear what data is constant and safe to rearrange by an optimizer. The getUniquelyUsedVariable() function seems to imply that it is okay to modify the layout of a public variable, which seems wrong to me. Also, it can be okay to modify the layout of a constant with multiple users if you know what you're doing.

We need a clear concept of a compile time constant that the compiler owns.

Testing strategy

Our tests should be testing the compiler running the way it is intended to be run in production. The design presented here creates a dichotomy:

  • PlaceHolder: This is the code we run in production.
  • Variable: This is the code we test.

Any place in the compiler that behaves differently when handling a PlaceHolder or a Variable creates this split between code that we test, and code that we run in production. This is a very dangerous test design, and I strongly advise against it. We can't have a "test passing mode" and a "production mode" that do two different things.

If it is too hard to write unit tests that exercise the compiler the way it is intended to be used in production, that is a serious problem that we need to address.

We have a lot of good tests that are exercising the compiler in a way it will never see in production. These tests are not providing much value at the moment. Why not use these tests to verify the production mode of the compiler?

@nadavrot
Copy link
Contributor Author

nadavrot commented Aug 8, 2018

@stoklund I agree with some of what you wrote, but I don't understand your conclusion. I think that your objection is that #1334 is not going far enough. Right? This pull request is about creating the new abstraction: placeholder, which is a new kind of unbound variable. We agree that the inputs/outputs should not be backed by compiler-owned tensors. The first step that this PR takes is to split Variable into bound and unbound entities. Constants are obviously bound, because they need to be backed by tensors to allow things like constant propagation and compile-time quantization. Inputs are unbound. This PR is the first step of introducing this split. This PR is just the beginning, and it's not even functional, because the execution parts and the other stuff I mentioned in the PR. I agree with many of the things that you wrote, but not all of the comments are about this PR. The goal is not to have different code paths for debug and production. The goal is the opposite, to change the tests to use the new Placeholder instead of the graph-owned variables. The idea is to minimize the use of graph-owned variables and hopefully to eliminate it completely. Regarding the constant transformations, I think that you either found a bug in the optimizer or that something is not documented properly because the optimizations should honor the public/private annotations when transforming the layout of tensors. In the next few iterations we should eliminate the private/public attributes because all placeholders should be public and all bound variables should be private.

I don't agree that the tests that we have don't bring value. Our tests verify things like graph transformations and the numeric implementations of our operators and are extremely valuable. Our tests stress a wide range of functionality across the compilation pipeline. You mentioned running different code paths in testing and in production. The problem is not that there are different code paths for debug/release, but that at the moment the backends has the responsibility to manage the state manually (for example, AOT compilation that has to pickle variables).

We finished our last design discussion with the problem of the IRBuilder (of LSTMs with 100s of vars) and variable initialization and ownership. We did not have a good solution for the problem of test verbosity. We agreed to meet again and come up with a plan of how to actually make this work. I think that one way to help push this forward and eliminate graph-owned state would be to propose designs for the remaining problems. What do you think?

@stoklund
Copy link
Contributor

stoklund commented Aug 8, 2018

If you have a plan, you should lay it out over in #1334. Where are we going, and how are we going to get there?

@stoklund
Copy link
Contributor

stoklund commented Aug 9, 2018

To be clear, this is fine as a first step towards fixing the design.

Copy link
Contributor

@qcolombet qcolombet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for pushing on fixing this part of Glow!!

@nadavrot nadavrot force-pushed the add_place_holder_node branch from f91d1d3 to 88ec3ba Compare August 30, 2018 23:36
…s that we can construct the graph.

This commit is the first few steps in the implementation of the
Placeholder node. The plan is discussed in pytorch#1334. Placeholder are
unbound variables where the content of the tensor is provided by the
execution engine or the AOT compiler at runtime.

This commit introduces Storage as a superclass for Variable (bound) and
Placeholder (unbound). This PR also adds a simple unit test that checks
that we can create new Placeholder nodes. This PR does not implement any
of the infrastructure that's required to do anything useful with
Placeholder variables, such as loading content into them, deleting them,
pretty dotty-printer, etc.

Next steps (not necessarily in any specific order):

1. Teach the execution engine to bind tensors to the Placeholder nodes.

2. Verify that dotty printing, dump() and debugging work well.

3. Cleanup the APIs that are related to Variable and Placeholder and
make them consistent.

4. Change (some of) the unit tests to use the new Placeholder API.

5. Make sure that our optimizations are correct when placeholder are
used.
@nadavrot nadavrot force-pushed the add_place_holder_node branch from 88ec3ba to db57a85 Compare September 5, 2018 23:24
@nadavrot nadavrot merged commit 28e11aa into pytorch:master Sep 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants