-
Notifications
You must be signed in to change notification settings - Fork 214
Need a way to pass trainable Variables to Optimizer #307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think we'll need both. Keras will want the former as then you can freeze a layer so it's not trainable (which is very useful for transfer learning), and non-Keras users (which is everyone until all the bits land) will want the latter to make it easier to use the optimizers without hitting the issue you're seeing. I'm happy to implement the latter option tomorrow to get you unblocked. |
I would definitely like option 1 to be implemented at some point. Taking a list is necessary for eager mode (presumably it will be gotten from a gradient tape) and helps support 3rd party model implementations. Also note that |
I added a parameter to
I assume that this is because the loss operation tree contains a Cast operation. |
Ugh, can't believe that's missing. It's easy enough I'll PR it to tensorflow core, but it will be a while before it makes it into a release. |
I wonder if there are more of these? |
Yeah, I know of Concat, function calls, and resource variable reads. I've added resource variable reads and function calls to tensorflow core, but it's a pain to do and the release cycle lags, thus #292. |
FYI @JimClarke5 Cast has a gradient in 2.6 (which is used in snapshots). |
Yes, I saw that last week. However, it turns out I was passing the wrong arguments to my loss operation as I should have included the trainable variables in the loss operation. When I did that the Cast error went away, even though Right now I am adding a |
You should compare the C API gradient code (https://github.com/tensorflow/tensorflow/blob/5dcfc51118817f27fad5246812d83e5dccdc5f72/tensorflow/cc/framework/gradients.cc#L438) to the Python TF v1 gradient code (https://github.com/tensorflow/tensorflow/blob/1bb4e4fce0c992e2d995e1663138ad98e957e050/tensorflow/python/ops/gradients_impl.py#L44). The latter is full of locks and control dependencies, and the former has none of those. So stochastically you get the wrong gradients out. I've got C API and Python gradient graphs for the simple model in the failing test in Tensorboard, but I've not had chance to run it down properly. |
The Python and C/C++ versions are totally different than the Python implementation for v1, I don't see any similarity other than the fact that they calculate gradients. I am wondering if we should implement a Java version of |
I agree the C API version is completely different, but the fact that it's lacking the locking and dependencies is the reason it doesn't produce the same output as the Python v1 stuff. I think we would need to wrap the C++ API to get the gradient tape to work as there isn't a C API for it (tensorflow/tensorflow#47748). I know @saudet started it (https://github.com/tensorflow/java/pull/283/files), but that only gets off the ground into Java, we'd need to do a lot of work to mirror the Python code. It's not clear that that is less effort than fixing the C API, though it has a much better long term payoff. @karllessard do you know how the C API gradient code came to be? As far as I can tell only TF-Java has ever used it, so maybe they haven't figured out it's broken? No-one at Google responded to the issue I opened - tensorflow/tensorflow#48855. |
I've added myself the various methods to add the gradients from the C API a long time ago, like this one, especially for Java support, so I don't think other clients that TF Java is using it. We can noticed that the code has been updated since and that the C++ layer supporting it has changed. |
In woking with
Model
training, an issue onOptimizer
has shown its head.Currently, when calling
minimize(loss)
on theOptimizer
instance, theOptimizer
code walks the entireGraph
and pulls out all the definedVariables
in the graph. The idea is when you callminimize(loss)
, theOptimizer
builds gradients based on all the variables. However, when working withModel
, this "all variables approach" breaks down, because some variables are not referenced in theloss
operand execution path. This produces the following error:org.tensorflow.exceptions.TFInvalidArgumentException: Cannot compute the partial derivative for node 'model/mse_total' as it's unreachable from the output node(s).
This specific error is because the MSE metric's internal variables are not within the
loss
execution path. This pattern of "non-trainable variables (weights)" is in mostMetric
classes, and in theModel
itself, so it is wide spread. What we need is a way to distinguish between trainable and non-trainable variables. Trainable variables would then be used to calculate the gradient values in theOptimizer
.In Python tensorflow, the Keras
Layer
s track the trainable variables as an attribute list, the Model then passes the collected lists to theOptimizer
's minimize method.There are a couple of options here:
Layer
identify its trainable variables, Then, pass the trainable variables as aList<Variable<?> list
using a call like,Optimizer.minimize(loss, trainableVariables)
, then have theOptimizer
minimize
routine calladdGradients
with this variable list, rather than walk the whole Graph, to compute the gradients.Optimzier.minimize(loss)
, walk theloss
operand execution path to locate any variables contributing to the loss calculation, then pass these toaddGradients
. A solution based on this option may be facilitated using Add graph walking functions to Graph and GraphOperation #232, "Add graph walking functions to Graph and GraphOperation".The text was updated successfully, but these errors were encountered: