Skip to content

Add fetchVariable method to Session to get value of resource variable #261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 6, 2021

Conversation

rnett
Copy link
Contributor

@rnett rnett commented Mar 30, 2021

Small PR to add a method to Session to allow for the fetching of the values of resource variables. Fetches the resource tensor then uses an eager session to read the value.

@rnett
Copy link
Contributor Author

rnett commented Mar 30, 2021

CI errors are from Javadoc, tests are passing.

@Craigacp
Copy link
Collaborator

I don't think splitting fetch and fetchVariable up is a good idea. Users have no way to know if the thing they are fetching should be fetched with fetchVariable or fetch without trying both and catching the runtime exception, and we should never want users to catch runtime exceptions.

In addition to the API issue, why does this require creating an eager session and constant op to read a value? What are the alternatives?

@rnett
Copy link
Contributor Author

rnett commented Mar 30, 2021

Users can tell if their operand is a variable by checking the data type, maybe we should add a Operand.isResourceVariable()? But yeah, since we don't have a tensor type for resources there's no good way to make it typesafe. Adding it is a possibility, and probably good to do at some point, but beyond me at the moment (it's not mappable to NDArrays, so it doesn't fit well with the existing tensor type setup).

Fetching a variable is almost always an error anyways, since it's not usually an output, especially if you don't realize what you're fetching is a variable (i.e. #260). This is intended for the special use cases where you want to get a variable, in which case you'll know ahead of time it's a variable.

To read a value from a resource tensor, you need to use the readVariableOp Op, either in the graph or eager session. I could add the ops to the graph and fetch them instead, using reads if they are already present, but that adds extra ops to the graph and it's not easy to detect if you already have a read, unless we limit it to ones added by this (attributes need to match too (i.e. not being on another device) and we don't have a way to check those). It's worth considering, but creating eager sessions doesn't seem heavy, so I went with that.

@Craigacp
Copy link
Collaborator

I think fetching variables is entirely reasonable as you can fetch everything else in the graph by name. Having a set of things that you can't fetch and the only way to find out is by trying it and catching a runtime exception is bad.

I'd expect TF python to add readVariableOp during variable creation or when the graph is saved, so shouldn't those ops exist in any graph that has resource tensors?

@rnett
Copy link
Contributor Author

rnett commented Mar 30, 2021

I think fetching variables is entirely reasonable as you can fetch everything else in the graph by name. Having a set of things that you can't fetch and the only way to find out is by trying it and catching a runtime exception is bad.

Agreed, but I don't think it's possible, at least for now. The reason I did this in a separate method is because we can't fetch variables normally. There's no Resource TType, so tensor creation fails. Even if we did add one, it doesn't fit with the existing TType setup as the only method we would want to provide is read(Class<? extends TType>). We can't even auto-read variables in fetch since they don't specify their value's dtype (at least that I can find), so it needs the class parameter.

I'd expect TF python to add readVariableOp during variable creation or when the graph is saved, so shouldn't those ops exist in any graph that has resource tensors?

I think it does, but it's possible for people to use the raw op rather than tf.Variable. Checking for a read op is probably the better route though.

@Craigacp
Copy link
Collaborator

I think fetching variables is entirely reasonable as you can fetch everything else in the graph by name. Having a set of things that you can't fetch and the only way to find out is by trying it and catching a runtime exception is bad.

Agreed, but I don't think it's possible, at least for now. The reason I did this in a separate method is because we can't fetch variables normally. There's no Resource TType, so tensor creation fails. Even if we did add one, it doesn't fit with the existing TType setup as the only method we would want to provide is read(Class<? extends TType>). We can't even auto-read variables in fetch since they don't specify their value's dtype (at least that I can find), so it needs the class parameter.

I'd expect TF python to add readVariableOp during variable creation or when the graph is saved, so shouldn't those ops exist in any graph that has resource tensors?

I think it does, but it's possible for people to use the raw op rather than tf.Variable. Checking for a read op is probably the better route though.

Why not have it check for the read op and then silently redirect the fetch to that read op. That will give us back a non-resource tensor right? Otherwise unpicking the resource and exporting it into a regular tensor seems fine too. I'm wary of spinning up and throwing away eager sessions to access a variable. There's a bunch of stuff that happens in TFE_NewContext.

@rnett
Copy link
Contributor Author

rnett commented Mar 30, 2021

It looks like it is possible to get the variable dtype from the attribute, so I can do everything using graph based ops, and automatically. However, eventually, once we have support for resource tensors, fetching a resource tensor will be a legitimate operation. So there still needs to be some way to differentiate between fetch w/ conversion and fetch without. Maybe make fetch auto-convert for now, and once we have resource tensor support add fetchVariable that gets the resource tensor?

@rnett
Copy link
Contributor Author

rnett commented Mar 30, 2021

Ok, it automatically wraps variables in a read in the graph in fetch now.

@karllessard
Copy link
Collaborator

karllessard commented Mar 31, 2021

@rnett, did you try simply using the TF_TensorData endpoint of the C API to get a pointer to the variable native buffer directly? That's what the 1.x client was using before, making a copy first before returning it though. And I'm not sure if that was supporting variables but it could be interesting to try?

BTW this is mostly out of curiosity, I'm also fine with the usage of readVariableOp you and @Craigacp proposed earlier.

Signed-off-by: Ryan Nett <[email protected]>
@rnett
Copy link
Contributor Author

rnett commented Mar 31, 2021

I didn't try that, but I assume the 1.x client was using the old variables, not resource variables. I'm not sure how to get a tensor from that buffer, either.

@karllessard
Copy link
Collaborator

I didn't try that, but I assume the 1.x client was using the old variables, not resource variables. I'm not sure how to get a tensor from that buffer, either.

Probably using that endpoint, should be easy enough to convert the TF_Buffer to a ByteDataBuffer.

But anyway, if Python does it with the readVariableOp, then let's do that as well

@rnett
Copy link
Contributor Author

rnett commented Mar 31, 2021

But anyway, if Python does it with the readVariableOp, then let's do that as well

👍 They are already added to the graph during the variable's creation in Python, and I plan on doing something similar for our version, so it should be fine.

Copy link
Collaborator

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small things about logging/messages, and I think a test loading a python graph is important to have.

@rnett rnett requested a review from Craigacp April 4, 2021 02:03
Copy link
Collaborator

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Craigacp Craigacp merged commit e229028 into tensorflow:master Apr 6, 2021
Craigacp added a commit that referenced this pull request Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants