Init Scope #338

rnett · 2021-06-16T04:13:58Z

Reworks how initialization is handled (somewhat, mostly the API). I was originally planning on doing name-based for interop with tensorflow, but they don't do name based either so I kept the current method.

The API has changed: instead of adding init ops yourself, they will automatically be added when created with an init scope, which you get using Ops.initScope() or Scope.initScope(). This allowed for better error checking (i.e. init ops can't depend on non init ops), eliding control dependencies on them (since they are created at init time), but required making Scopes part of OpBuilders (which was a good thing mostly, and prevents future "forgot to call apply" bugs). I also added currently unused methods to do initialization in a different execution environment, which will be used by functions.

So for an example, creating a variable w/ an initial value becomes tf.initScope().variable(tf.initScope().constant(4f)), which automatically registers it for initialization by sessions.

I also added helpers to Session to create and initialize and a requirement that if the graph has init ops, initialization must b ran before running anything else. I don't really like having separate initialized factory methods, thoughts on having it be initialized on construction by default but having Session(Graph, boolean) constructors where you can control it?

I have yet to update framework to use this, does it work for what you were working on w/ variables @JimClarke5?

The new generated op changes also aren't committed yet, for size reasons.

Craigacp

This change is going to break Tribuo in ways that are hard to fix, because it removes the init op we use to find initializers in serialized GraphDefs and there doesn't seem to be a way to get the initialization op name back out. I think you should consider how this interacts with serialized GraphDefs which are the only way to persist a graph structure that we've got at the moment. The current init mechanism mirrors what TF Python does in terms of what gets persisted into a graphdef.

You can see how Tribuo currently uses the init ops here - https://github.com/oracle/tribuo/blob/main/Interop/Tensorflow/src/main/java/org/tribuo/interop/tensorflow/TensorFlowTrainer.java#L488 through line 514.

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Session.java

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/op/Scope.java

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Session.java

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Graph.java

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/GraphOperationBuilder.java

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Session.java

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/op/core/Init.java

tensorflow-core/tensorflow-core-api/src/test/java/org/tensorflow/SavedModelBundleTest.java

...core/tensorflow-core-generator/src/main/java/org/tensorflow/op/generator/ClassGenerator.java

tensorflow-framework/src/test/java/org/tensorflow/framework/optimizers/GradientDescentTest.java

rnett · 2021-06-17T21:38:11Z

The exporting and importing is a problem, can you link where Tribuo does the exporting? The intent was that when importing Python models, you would find the Restore op and use Graph.registerRestoreOp, I just need to add something similar for Java import and export.

rnett · 2021-06-17T21:52:14Z

Looking more into SaverDef, I'm not going to be able to use that. I'm planning to create essentially a tf.init() op with all the init ops as control dependencies and a set name pattern on export, and on import find that op and use Graph.registerRestoreOp. This would mean that using importGraphDef would load all of the graph's initializers if the graph was exported from Java, which I think would solve your issue @Craigacp?

I also made Session.restore count as initialization for the checks.

Craigacp · 2021-06-17T21:52:57Z

Tribuo uses GraphDef as it's representation of a graph before it's been trained, as its trainers are designed to be thread-safe so there can be multiple concurrent calls to train in flight. That means we need to be able to recreate the graph independently for each call to train. The Graph is only valid inside a call to train, then it's destroyed and recreated inside the Model object. All the logic is in the train call here - https://github.com/oracle/tribuo/blob/main/Interop/Tensorflow/src/main/java/org/tribuo/interop/tensorflow/TensorFlowTrainer.java#L466. It starts with a GraphDef, instantiates a Graph containing that GraphDef, adds the appropriate output and gradient update operations to that graph, runs init for the graph, runs init for the Tribuo added operations (which basically just inits the gradient slots), then starts training the model and stepping the gradient optimiser. The two phase init is because I can't easily aggregate all the necessary init variables from the supplied GraphDef with the ones Tribuo adds because I can't find them.

rnett · 2021-06-18T00:38:59Z

Ok, GraphDef import/export is now supported, see https://github.com/rnett/java/blob/rn_init_scope/tensorflow-core/tensorflow-core-api/src/test/java/org/tensorflow/GraphTest.java#L68

rnett · 2021-06-18T04:11:58Z

@JimClarke5 I've only done enough updates to framework to get the tests to pass, you might want to do a full pass once this is merged.

karllessard · 2021-07-07T20:18:17Z

@rnett , can you rebase this PR or fix the conflicts please?

Signed-off-by: Ryan Nett <[email protected]>

…ration Signed-off-by: Ryan Nett <[email protected]>

Signed-off-by: Ryan Nett <[email protected]>

rnett · 2021-07-07T21:19:17Z

Ok, done.

Signed-off-by: Ryan Nett <[email protected]>

rnett · 2021-07-26T03:53:17Z

Snapshot builds appear to be broken, the mac artifact is missing from https://oss.sonatype.org/content/repositories/snapshots/org/tensorflow/tensorflow-core-api/0.4.0-SNAPSHOT/, and the CI is failing because of that.

cc @karllessard

Signed-off-by: Ryan Nett <[email protected]>

tensorflow-core/tensorflow-core-api/src/gen/annotations/org/tensorflow/op/Ops.java

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Session.java

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Graph.java

Signed-off-by: Ryan Nett <[email protected]>

JimClarke5 · 2021-08-09T14:59:36Z

@rnett I am trying to build on macOS, pulling rnett:rn_init_scope, and I keep getting these errors:

[ERROR]   SavedModelBundleTest.exportFunctionWithVariables:208 expected: <17.781294> but was: <18.12133>
[ERROR]   SavedModelBundleTest.exportMultipleFunctions:127 expected: <4.093658> but was: <5.931101>
[ERROR]   SessionTest.saveAndRestore:234 expected: <org.tensorflow.internal.types.TFloat32Mapper$DenseTFloat32@2024886b> but was: <org.tensorflow.internal.types.TFloat32Mapper$DenseTFloat32@8a1de29>

Any suggestions.

JimClarke5 · 2021-08-10T19:26:11Z

@rnett does the variable and init operand have to both be under the same initScope?

Also can you re-init the variable later? I am looking at Metrics resetStates which resets the variables to their initial values. It would be nice not to have to invoke run just to make sure this happens.

For example:

Upon original init:
Variable<T> truePositives = tf.withName(truePositivesName).withInitScope().variable(zero);

I assume, this will cause variable truePositives to be set to zero on the next session run.

Later on I want to reset truePositives back to zero. This is on its own and is not a part of any other operand group.
tf.withName(truePositivesName).withInitScope().assign(truePositives, zero)

I desire to have the assign operand pushed onto the init stack, then executed at the beginning of the next session run.
Will this work?

Signed-off-by: Ryan Nett <[email protected]>

rnett · 2021-08-11T21:35:53Z

@JimClarke5 It should, although I'm not sure we want to purposefully introduce that kind of eager-like graph modification to session semantics. Imo it would be odd to intentionally have a Java function that's not a session run affect the session state, the "run all new initializers" was intended more as a failsafe. You could run into the thread safety issues w/ modifying a graph while the sessions is open, too. Ideally, I think the solution would be to have things like Metrics use Eager variables to store their state, but that makes managing them more complicated.

Signed-off-by: Ryan Nett <[email protected]>

JimClarke5 · 2021-08-13T23:47:07Z

I have totally refactored Metrics to take advantage of this PR and it's a resounding success.
Good job @rnett

I have removed the Ops parameter in the CTORs and now pass it in the new Metric interface calls.
There is no longer a need for code to create control op dependencies for initialization nor a need to initialize the
metrics objects once they are created.
This will help with the Model class where the Ops is created after the metrics are created.
Also, the resetStates method returns an Op that will need to be run explicitly either alone or maybe as a control op
to another Op graph structure. (I decided to take @rnett's advice not to treat it within an initScope.)

Here is the main interface for Metric now:

interface Metric {

  List<Op> updateStateList(
      Ops tf, Operand<? extends TNumber> values, Operand<? extends TNumber> sampleWeights);

  List<Op> updateStateList(
      Ops tf,
      Operand<? extends TNumber> labels,
      Operand<? extends TNumber> predictions,
      Operand<? extends TNumber> sampleWeights);

  <T extends TNumber> Operand<T> result(Ops tf, Class<T> type);

  Op resetStates(Ops tf);

  Op updateState(
      Ops tf, Operand<? extends TNumber> values, Operand<? extends TNumber> sampleWeights);

  Op updateState(
      Ops tf,
      Operand<? extends TNumber> labels,
      Operand<? extends TNumber> predictions,
      Operand<? extends TNumber> sampleWeights);

  <T extends TNumber> Operand<T> callOnce(
      Ops tf,
      Operand<? extends TNumber> values,
      Operand<? extends TNumber> sampleWeights,
      Class<T> type);
}

Once this PR is merged I will create another PR for these metrics' changes.

My next goal is to rework Layers.

JimClarke5 · 2021-08-17T11:52:40Z

I have finished converting layers, and optimizers and everything works. I have refactored both to not include Ops/Graph in the CTORs. Once this PR is merged, I will create separate PRs to check in my changes.

Except for these failures in the overall build, I am good to go

[ERROR] Failures: 
[ERROR]   SavedModelBundleTest.exportFunctionWithVariables:208 expected: <17.614902> but was: <17.720098>
[ERROR]   SavedModelBundleTest.exportMultipleFunctions:127 expected: <4.6954327> but was: <5.659465>
[ERROR]   SessionTest.saveAndRestore:234 expected: <org.tensorflow.internal.types.TFloat32Mapper$DenseTFloat32@ab941f5f> but was: <org.tensorflow.internal.types.TFloat32Mapper$DenseTFloat32@91507d4d>

I do notice that the exact values are different from when this error first appeared for me.

rnett · 2021-08-17T18:48:10Z

Can you rebase on this branch? The latest commit here fixed those for me.

JimClarke5 · 2021-08-17T20:28:36Z

@rnett I have pulled your latest code and all is good now. I am ok with this PR.

karllessard

LGTM, @Craigacp any additional comment before I merge this?

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Graph.java

Craigacp · 2021-08-21T02:09:35Z

LGTM, @Craigacp any additional comment before I merge this?

Apart from the NoOp issue that I just mentioned it all looks fine.

I am worried about the level of change this is to the codebase and I can't quite see the reason for it, but that doesn't mean there isn't a good one.

rnett · 2021-08-21T02:30:25Z

@Craigacp It's mostly to support functions, which are Graph's with Eager init scopes, which wasn't possible to implement using fake graph ops like we were using for init previously.

karllessard · 2021-08-21T02:32:34Z

I am worried about the level of change this is to the codebase and I can't quite see the reason for it, but that doesn't mean there isn't a good one.

According to @JimClarke5 comment, the improvements in the framework are substantial. Personally, I think if we can lift the burden of initialization from the users, it's a plus but saved models were already handling this for inference so changes are just impacting users that were doing their training in Java (@rnett, am I right here?)

rnett · 2021-08-21T02:38:00Z

According to @JimClarke5 comment, the improvements in the framework are substantial. Personally, I think if we can lift the burden of initialization from the users, it's a plus but saved models were already handling this for inference so changes are just impacting users that were doing their training in Java (@rnett, am I right here?)

Yeah, the SavedModel handles it for Tf2 models. This also makes it a big easier to handle TF1 models and we can export our initializers easier (I'm going to make a new PR for those).

JimClarke5 · 2021-08-21T11:42:38Z

This PR will be beneficial whenever a model is created by the developer, which includes all the major model actions fit, evaluate and predict. Actions, such as predict, may not necessarily require that the model be saved first.

Craigacp

Ok, I think I'm fine with this being merged.

I do wonder if we could get away with just tracking the init ops better as we own all of them without creating a privileged init scope that people could use incorrectly (and isn't type safe), but I don't understand the eager/graph function use case well enough to know if that would cause problems.

karllessard · 2021-08-23T21:39:10Z

Thanks @rnett !

Craigacp requested changes Jun 17, 2021

View reviewed changes

rnett added 21 commits July 7, 2021 14:05

Start of init scope

3f5817b

Signed-off-by: Ryan Nett <[email protected]>

Fix tests

f2c6000

Signed-off-by: Ryan Nett <[email protected]>

Javadoc updates

342237c

Signed-off-by: Ryan Nett <[email protected]>

Session init helpers

5502ec3

Signed-off-by: Ryan Nett <[email protected]>

Format fixes

abc5617

Signed-off-by: Ryan Nett <[email protected]>

Make initEnv default to this.

c959696

Signed-off-by: Ryan Nett <[email protected]>

More formatting fixes

fae97bb

Signed-off-by: Ryan Nett <[email protected]>

Small fixes, add native pointer based equals and hashCode to EagerOpe…

e8fbee6

…ration Signed-off-by: Ryan Nett <[email protected]>

Export init ops to GraphDefs and import from them

feb90fc

Signed-off-by: Ryan Nett <[email protected]>

Test adding init ops after import

bb2b05c

Signed-off-by: Ryan Nett <[email protected]>

Automatically lift constants to init if required

94d48c3

Signed-off-by: Ryan Nett <[email protected]>

Add withInitScope

4e97df5

Signed-off-by: Ryan Nett <[email protected]>

Allow init ops to depend on other init ops

2dbaebf

Signed-off-by: Ryan Nett <[email protected]>

Add void withInitScope

18ef780

Signed-off-by: Ryan Nett <[email protected]>

Lift init inputs to init as well

1c8b463

Signed-off-by: Ryan Nett <[email protected]>

Formatting

8b43786

Signed-off-by: Ryan Nett <[email protected]>

Allow use of init input lifting

4adb76b

Signed-off-by: Ryan Nett <[email protected]>

Add forceInitialize to reinitialize session

89a4fda

Signed-off-by: Ryan Nett <[email protected]>

Replace withInitScope with liftToInitScope

cc3ebcc

Signed-off-by: Ryan Nett <[email protected]>

Update framework

27ce339

Signed-off-by: Ryan Nett <[email protected]>

Rebase fixes

3f2593a

Signed-off-by: Ryan Nett <[email protected]>

rnett force-pushed the rn_init_scope branch from df84422 to 3f2593a Compare July 7, 2021 21:18

rnett added 2 commits July 25, 2021 20:46

Fix format

819f6e0

Signed-off-by: Ryan Nett <[email protected]>

Fix not building init op

e663db7

Signed-off-by: Ryan Nett <[email protected]>

Use init scope in variable-with-init

9436f2b

Signed-off-by: Ryan Nett <[email protected]>

Craigacp reviewed Jul 30, 2021

View reviewed changes

rnett added 4 commits July 30, 2021 13:20

Fix comments, make ranInits final

dd6e94f

Signed-off-by: Ryan Nett <[email protected]>

Update toGraphDef comment

7acc6b5

Signed-off-by: Ryan Nett <[email protected]>

Update Sessions comment

a16c11e

Signed-off-by: Ryan Nett <[email protected]>

Remove wildcard import

07143ae

Signed-off-by: Ryan Nett <[email protected]>

Change init op name to Init

430e0ff

Signed-off-by: Ryan Nett <[email protected]>

rnett added 2 commits August 11, 2021 16:59

Don't include NoOps in initializer list

7df071c

Signed-off-by: Ryan Nett <[email protected]>

Fix format

2ecdebf

Signed-off-by: Ryan Nett <[email protected]>

rnett requested a review from karllessard August 15, 2021 21:04

karllessard approved these changes Aug 21, 2021

View reviewed changes

Craigacp reviewed Aug 21, 2021

View reviewed changes

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Graph.java Show resolved Hide resolved

Craigacp approved these changes Aug 23, 2021

View reviewed changes

karllessard merged commit 242931c into tensorflow:master Aug 23, 2021

Init Scope #338

Init Scope #338

Uh oh!

Conversation

rnett commented Jun 16, 2021

Uh oh!

Craigacp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rnett commented Jun 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rnett commented Jun 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Craigacp commented Jun 17, 2021

Uh oh!

rnett commented Jun 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rnett commented Jun 18, 2021

Uh oh!

karllessard commented Jul 7, 2021

Uh oh!

rnett commented Jul 7, 2021

Uh oh!

rnett commented Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JimClarke5 commented Aug 9, 2021

Uh oh!

JimClarke5 commented Aug 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rnett commented Aug 11, 2021

Uh oh!

JimClarke5 commented Aug 13, 2021

Uh oh!

JimClarke5 commented Aug 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rnett commented Aug 17, 2021

Uh oh!

JimClarke5 commented Aug 17, 2021

Uh oh!

karllessard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Craigacp commented Aug 21, 2021

Uh oh!

rnett commented Aug 21, 2021

Uh oh!

karllessard commented Aug 21, 2021

Uh oh!

rnett commented Aug 21, 2021

Uh oh!

JimClarke5 commented Aug 21, 2021

Uh oh!

Craigacp left a comment

Choose a reason for hiding this comment

Uh oh!

karllessard commented Aug 23, 2021

Uh oh!

Uh oh!

Craigacp left a comment •

edited

Loading

rnett commented Jun 17, 2021 •

edited

Loading

rnett commented Jun 17, 2021 •

edited

Loading

rnett commented Jun 18, 2021 •

edited

Loading

rnett commented Jul 26, 2021 •

edited

Loading

JimClarke5 commented Aug 10, 2021 •

edited

Loading

JimClarke5 commented Aug 17, 2021 •

edited

Loading