Skip to content

Framework: Move Ops parameter to call method where possible #202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rnett opened this issue Jan 31, 2021 · 25 comments
Open

Framework: Move Ops parameter to call method where possible #202

rnett opened this issue Jan 31, 2021 · 25 comments

Comments

@rnett
Copy link
Contributor

rnett commented Jan 31, 2021

I'd like to move the Ops parameters of framework classes to the call method, where possible. This is primarily for Kotlin interop, but has a few other benefits as well. It won't be possible for stateful classes (Metrics, Optimizers), but should be possible for most, as far as I can tell (Initializers, Activations, Losses). I'm going to use Losses as a standin for all 3 for my examples.

  1. Kotlin interop w/ @FunctionInterface. If we do this, we can then define new losses like Loss{ tf, x -> tf.stuff(x) } or pass lambdas to methods that take losses. This is very nice for layers, where we might have something Keras-like Layer(activation=ReLU()) but want to replace it with something custom.
  2. Re-use of objects. Currently, losses create any subsequent calls in the same scope as their first call. That means if it's called inside a sub-scope, including device ones, it ignores the scope. This is somewhat expected, but not ideal. It also causes further issues if we use Ops for (eager) tensor lifetime management, which has been suggested and is something I'd like to do (it's easy enough to make a long-lived copy of the initial scope, but then the tensors created in the call methods live forever, and the framework classes need to be closable).
  3. Passing configs, like Keras. In Keras, if you have a activation or loss that requires some parameters, you can pass it to a layer like Layer(activation=LeakyReLU(alpha=0.3)). I expect this will be common with our API, as well. Currently, this runs into the above issue w/ scoping, and prevents you from passing activations (or losses) from scopes that don't have an Ops instance available.

I'd look at having the stateful classes take Ops in call as well, and only using the constructor ops for initializing state. This works better with scoping and lifetimes as mentioned above.

@karllessard
Copy link
Collaborator

@JimClarke5 , we've been discussing Ryan and I about this and would like to revive this proposal, whenever you're ready

@JimClarke5
Copy link
Contributor

Actually, this may help with some of the stuff I am doing for Model. The Model creates the graph, so it is difficult to create an Optimizer first, until the Model does this. Right now I have Model using Lambda functions to create Loss, Metrics and Optimizers after the Model creates its Graph/Ops.

I am open to a way to defer setting the Ops (or whatever) later on in the Object's life cycle.

BTW: I think we should revisit Optimizers to be consistent with the rest of the framework packages.

@rnett
Copy link
Contributor Author

rnett commented Apr 23, 2021

Optimizers will probably want to (only?) support the new variables, so you may want to wait for that. It would be nice if there was an option to pass in a list of variables to optimize rather than checking the graph, for eventual eager support.

@JimClarke5
Copy link
Contributor

Some of the classes create internal Variable's and such before the call method. I would vote to use setTF(Ops tf) or init(Ops tf) to initialize the classes and take Ops tf out of the ctors. We could optionally pass tf to call methods, which would call setTF if tf is not already set. One issue is, if the class creates Variable's, then a session.run() needs to be run to initialize the variable before it is used the first time. We might be able to use control dependencies to do this, but the control dependency can only exist when the Variable is first used, and not after.

@rnett
Copy link
Contributor Author

rnett commented May 6, 2021

I agree w/ init, I was doing something similar in Kotlin w/ delegates, so supporting that is nice. It might be nice to have a overloadable, class-scoped init but also a onInit that takes a lambda to call on initialization for things like initing variables.

Something like:

Variable<TFloat32> w = newVariable("w", Shape.of(10, 10));
onInit(() -> w.initWith(...));

Most of the variable stuff should be handled by the new API (using init scopes), once the gradient makes it into tensorflow core, but if you want to apply a initializer to a variable or something it's still good to have that step.

@rnett
Copy link
Contributor Author

rnett commented May 6, 2021

This applies to Layer, as well.

@JimClarke5
Copy link
Contributor

Should we create an interface, Initable ?

public interface Initable {
    public void init(Ops tf);
    public void onInit(Consumer<Ops> onInit);
}

@JimClarke5
Copy link
Contributor

Also, I think we would have to do all the ctors in framework that take an Ops.

@rnett
Copy link
Contributor Author

rnett commented May 11, 2021

It would be nice to not have to expose it to the user, i.e. something like:

abstract class Activation {
  private List<Consumer<Ops>> initers = new ArrayList<>();
  private boolean inited = false;
  protected final void onInit(Consumer<Ops> onInit){
    initers.add(opInit);
  }

  protected void init(Ops tf){}

  public final Operand<TFloat32> call(Ops tf, Operand<TFloat32> x){
    if(!inited){
      init(tf);
      initers.forEach((x) -> x.call(tf));
      inited = true;
    }
    return doCall(tf, x);
  }

  abstract Operand<TFloat32> doCall(Ops tf, Operand<TFloat32> x);
}

where the init stuff gets called on the first call. We could probably put the actual init stuff in an abstract class though.

@JimClarke5
Copy link
Contributor

So with the call proposal, what is the expected behavior the 2nd time call is invoked?

@rnett
Copy link
Contributor Author

rnett commented May 11, 2021

Right, duh, there should be a flag, edited it. The init logic should probably be moved to a synchronized method.

@JimClarke5
Copy link
Contributor

I do not favor passing Ops in the call or similar method, but rather in an init(Ops method) that is invoked one time. This is primarily due to

  1. The number of times a call or similar method might be invoked, especially during training.
  2. The off chance that a different Ops is passed after another Ops was passed the first time, representing distinct graphs.
  3. Handling of initializers that need to be invoked on the initialization pass (e.g. Metric variables).

My preference is Op init(Ops), so that the code can call session.run(instance.init(tf) to do the initialization before anything else.

@rnett
Copy link
Contributor Author

rnett commented May 12, 2021

I don't see the issue with 1, it's still re-creating all the operations, just in the same scope or a new one. And since you're passing in operands you almost always have the Ops available.

2 is exactly why I wanted Ops in the call method, since with Keras, you can do something like create a ReLU(0.3) object and then re-use it in different layers and even models. Same for Layers (which share weights, when used like this). This doesn't work if we always use the same Ops instance in call. The different graphs is a bit complex, too: oftentimes, when using functions, you will be able to say create an initialized variable in the parent graph, and use it in the function. That is supported and rather essential. Using things from eager sessions in graphs is (somewhat) supported, too. Using an incompatible Ops in a call would produce an exception, the same as if you used a Operand from said Ops, and leave doing so up to the ExecutionEnvironment implementation (since some will support it). Additionally, some framework classes work fine regardless of the init Ops since they don't store anything in initialization (i.e. most Activations).

For 3, I envisioned having Ops in call and init, so that you would still be able to do initializers and whatnot (3). You could use Ops.addInit or eventually initScope to run things on Session init, or not (and users of the class wouldn't have to register it themselves).

Also, I would like to separate the "Ops in call" from "init" a little bit. It's much nicer for the Kotlin API if we can make some of the commonly customized Framework classes (Activation, Loss) functional interfaces, and for the most part they don't need initialization, so you could have something like Activation with just the call method and StatefulActivation with the init handling (and perhaps an Initable base class, but that's an implementation detail). That's not possible for things like Optimizers (or Metrics?), but you won't be commonly passing custom ones there anyways.

@JimClarke5
Copy link
Contributor

FYI, It is not just call(), in Metric there is updateStateList, result, and resetStates.

@rnett
Copy link
Contributor Author

rnett commented May 12, 2021

Hmm, yeah, it seems like there's a definite dichotomy between stateful/graph linked classes like Optimizer and Metric and stateless ones like Activation, Loss, Regularizer, etc. I would be fine with just doing the Ops in call for the stateless ones, since you're not going to be able to pass lambdas for the stateful ones anyways, and they aren't customized as much. You could parameterize the state, but that seems like too much trouble to be worth doing.

@rnett
Copy link
Contributor Author

rnett commented May 12, 2021

The other option would be to always use eager variables for the state, like is done in functions. But again, I don't think that's worth doing for the few stateful classes.

@JimClarke5
Copy link
Contributor

The more I think about it, passing Ops to the call Method for stateless classes sounds appealing. I will look at it later today.

@JimClarke5
Copy link
Contributor

So these packages are stateless with respect to the TensorFlow graph; activations, constraints, initializers, losses, and regularizers. Therefore, changing the call method signature to call(Ops tf, ....) works just fine. There is no need for init(Ops) nor onInit(lambda), unless there is another good reason.

The stateful packages are metrics, layers, and optimizers, and would benefit with init(Ops) , especially from within Model where the initialization needs to be deferred until the Model creates its own Ops.

One thing I am seeing is that sometimes the code is more concise. if you also have an optional Ops in the ctor.
For example,

Input<TFloat64> i3 = new Input<>( "l3",  TFloat64.class, TFloat64.class,
                                 Layer.Options.create().inputShape(Shape.of(4, 5)));
i3.init(tf)

versus

Input<TFloat64> i3 = new Input<>(tf, "l3",  TFloat64.class, TFloat64.class,
                                 Layer.Options.create().inputShape(Shape.of(4, 5)));

The second case could call init(tf) from within the ctor. Where this pattern fails, is if the init(tf) returns an initializer Op, that cannot be handled by session.run(init(tf)) within the ctor. Input works because all init is doing is creating a Placeholder if the actual input Operand is not provided. Placeholder does not need to be be initialized like Variables. This pattern word not work with Dense which creates Bias and Kernel variables.

@rnett
Copy link
Contributor Author

rnett commented May 14, 2021

Why would you need to return an op, couldn't you just use Ops.addInit like the Variable-with-value op does internally? It should use initScope eventually anyways.

It does prevent pre-creating layers, i.e. outside of a Model and then passing them in, but that only matters if the Graph is tied to Model (I haven't looked at the API yet).

For the stateful packages (mostly Layers), did you plan on adding an Ops to the call method and just using the constructor one for initialization? That would be my preference, I think, since you can still call those classes in multiple places (i.e. different Ops). You can even create the Layer (and thus the variables) in Eager mode but call it in a Graph, which I think is how it's usually done in Python (well, with a function). If that's the case, doing the init check and call in the superclass call method like my above code would also avoid the verbosity.

I haven't looked at the Layer API details yet either, but how do you plan on handling weight sharing? Keras does it by re-using the Layer objects, and something similar would be nice here, which works well with taking the init Ops in the constructor/init or the first call and a separate Ops for call.

@JimClarke5
Copy link
Contributor

JimClarke5 commented May 14, 2021

I don't see Ops.addInit in my source, I do see Ops.initAdd, is that something new? I have used tf.variable with an init parameter, but that assigns the initializer to the global init list accessed by tf.init(). The problem with that is not all variables are created at startup and tf.init() does not clear the init cache, so if you run session.run(tf.init()) again, everything gets reinitialized, not just the new variables. I find that you have to control which init functions are called. Usually, I store the Assign Operand in the object and classes like Metrics use that Operand when resetStates is called, e.g. session.run(m.resetStates()). This essentially resets all the Metric's internal variables to zero. Setting a control dependency on the variable would have to be temporary, but maybe there is a way to handle that.

Variable initialization is a pita.

I was not planning on passing Ops to the layer call method, because layers have to initialize components like optimizers and metrics before processing. It would be possible to invoke init within call method, if it's the the first time, but the variable initialization issue pops up again.

I haven't run across weight sharing yet, but I do note that a variable with a unique name is sharable through out the Graph.
I did check and if you create a variable with a unique name using tf.withSubScope("foo").variable(initOperand) in a graph, if, later, you create a variable with the same name using tf.withSubScope("foo").variable(initOperand), you do get back the exact same variable. This feature is used in metrics, so 2 metrics with the same name share the same variable, which is intended. I don't know yet how this might be leveraged in layers.

My current Model class creates its own Graph and Ops.
For Models, specifically Sequential, the layers can be passed in the ctor, or added later individually. This would have to be against the Graph created by the Model itself. If we are going to support the Sequential ctor that accepts a list of Layers, then these layers need to be lazily initialized. Also, the Optimizer is another parameter that would need to be initialized later on.

There are some things in TF Python, tf.Variable that we need, like a way to impose constraints. I do that in my own class for now using a lambda. I haven't even bothered to deal with distributed issues for now, like synchronization and aggregation.

Action Item for me: I am going to research a way to add a control dependency to a variable for the initialization phase so that it only exists the first time it is used, and not subsequently.

@Craigacp
Copy link
Collaborator

One other wrinkle is that the initialisations are only populated in the Graph, but they aren't persisted in the GraphDef. So we do tricks like this in Tribuo to initialise the graph, then add the optimizers, losses and outputs and then re-call init to initialize those. This is a fairly ugly hack, but it's important to remember what is persisted and what isn't, as the skew between things in the Graph object and things in the GraphDef caught me out several times while developing Tribuo's TF interface.

@rnett
Copy link
Contributor Author

rnett commented May 14, 2021

initAdd is what I meant, I always get the name wrong.

I ran into the initialization stuff as well when working with functions and the new variables, see #238 and #237. I'm still waiting for the variable gradients in core to get released, but I can do the init scope sooner if you could use it.

The same-name trick will likely only work with the old variables, not resource variables, and only exists atm because of bugs in NameScope (it should be prevented, since any other name-clashing operations will error).

@JimClarke5
Copy link
Contributor

I have converted the metrics to try and make sure dependency operations are on operations when the variables need initialization. I have this working, but there are some issues.

  1. Accessing variables directly, like Variable<T> getTotal() in Reduce.java, does not have any control dependencies. I am thinking of changing this to something like getTotalValue() which returns an Operand<T>, based on tf.identity(). The control dependencies will be placed on the Operand<T> returned from tf.identity(), but this is not the actual Variable, just its current value.

  2. The other problem is, the first time you use an operation returned from a method like updateState() or result(), it will have the control dependencies, but after that it doesn't unless resetStates is called. This may be problematic, if the intent is to continually reuse the Operand, as it will always have the variable initialization control dependancies and would always re- init the Variables each time it is run. I could use the tf.identity hack mentioned, in Refactor modules and integrate JavaCPP to map the C API #1, but tf.identity returns a copy of the Variable contents, not the Variable itself. I tried using tf.select, but the system still wants to execute both the then and else parts eagerly, independent of the condition as in:

   tf.select(tf.isVariableInitialized(variable),
            tf.assignAdd(variable, value),
            tf.assign(variable, value));

Any suggestions would be appreciated.

I have changed the updateState, updateStateList, and result methods in metrics to optionally take an Ops tf parameter, e.g. Op op = instance.updateState(tf, labels, predictions, sampleWeight);.
This merely calls init(tf), and then passes control to the corresponding method that does not take the parameter.
Right now, init(tf) ignores the call if the class attribute tf is already set. What should the behavior be if a new Ops tf is passed?

@rnett
Copy link
Contributor Author

rnett commented May 19, 2021

I don't like using control dependencies for init, for the reasons you mentioned. It doesn't play nice with sessions either, even if you don't re-use the operand (it can re-init the variable each time the session runs if you aren't careful). Honestly, I'd wait for the tf.Variable and init scope APIs to do the stateful classes, I'm not sure how you would do it otherwise.

Also, the op you probably want instead of select is tf.cond, but it's not generated yet since it relies on functions to work.

@rnett
Copy link
Contributor Author

rnett commented May 28, 2021

@JimClarke5 here's what I was looking at for Kotlin: https://github.com/rnett/weaver/blob/main/src/main/kotlin/com/rnett/weaver/Module.kt#L95

You would need to specify keys for remembering stuff in Java, and I'm not sure how you would ensure init is called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants