-
Notifications
You must be signed in to change notification settings - Fork 579
RFC: TensorFlow Extension Types #269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merge changes from base
* `DimensionAlignment`: Encodes a correspondence between two related dimensions | ||
(e.g., between a `word` dimension and a `speaker` dimension). | ||
|
||
**Tensor-like types:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another obvious example would be some form of distributed tensor, like those from Mesh-TensorFlow.
* **Keras**: User-defined types can be used as inputs and outputs for Keras | ||
`Models` and `Layers`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a library like Keras support overrides of TensorFlow operations?
There are basically two alternatives:
- It could allow the behavior of Keras objects (e.g.,
Layer
instances) to be overriden explicitly via some mechanism like__tf_dispatch__
. - It could allow the behavior of Keras objects to be overriden implicitly based on their underlying implementation in terms of TensorFlow operations.
The main challenge with (1) is that objects are (in general) stateful, so the override mechanism described here for functions would not suffice.
In contrast, (2) seems initially appealing. Keras is defined on arbitrary extension types "for free". The problem is that now Keras' implementation in terms of lower level TensorFlow function is exposed as part of its public API: if the underlying implementation of a Keras layer changes, it could break extension types, which might not implement a function, or worse might implement it in an incompatible way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent question. I would lean towards approach (2), but you're right to point out that this adds an element of design risk when using a Keras layer with an extension type. Perhaps we can start with approach (2), but give appropriate warnings/disclaimers about the problem; and evaluate over the next few months whether this is a significant practical problem or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, there are at least a few other options:
- Keras doesn't (yet) support TF extension types (e.g., by always explicitly converting inputs with
tf.convert_to_tensor
). - Keras only supports TF extension types if an experimental flag is explicitly set. This flag might even be permanently required, if the Keras team doesn't want their implementation details to be considered public API.
I don't think shipping (2) and re-evaluating in a few months is a great option, because the costs of expanded API surface take a long time to become self-evident (i.e., at least a few major releases for users to start depending on a behavior and complain when it breaks). The point at which we notice that it's a problem could well be too late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also examples of this object-oriented dispatch question in the core TF APIs themselves (e.g. for the various tf.linalg.LinearOperator APIs)
The tricky thing is these may not provide any sort of canonical way to serialize/deserialize traced dispatched operations, because they can't be mapped to/from strings directly at the top-level api.
|
||
# We support ops that take tf.Tensor and SimpleMaskedTensor arguments. We | ||
# don't support any other dispatchable argument types (such as tf.RaggedTensor). | ||
__tf_dispatch_types__ = (tf.Tensor, SimpleMaskedTensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about builtin numeric types, like int
or float
. Would these need to be included explicitly in __tf_dispatch_types__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current design, arguments that don't define __tf_dispatch__
will be converted to Tensor
before dispatch is called. So ints, floats, np.arrays, etc., will get converted to Tensor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this RFC. I love the ideas suggested here.
* **Tensorflow hub**: User-defined types can be used as inputs and outputs for | ||
`tf.hub` modules. | ||
* **SavedModel**: User-defined types can be used as inputs and outputs for | ||
`SavedModels`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this RFC affect the use of exported signatures from loaded saved models?
Currently, the behavior for structured data seems to be pretty poor. Input and output nested structures are flattened (not always automatically, sometimes you need to do it yourself) and need to be accessed through keyword arguments by guessing the automatically generated tensor names that TF assigns to them. Not only that, but the naming used is not even consistent between inputs and outputs.
I explained these issues in more detail here and proposed an improvement, though this was only for nested structures. I imagine that supporting Python-based extension types in loaded signatures is either a non-goal or needs to C++ support too, but perhaps this is a good time to at least provide an interface that works nicely with tf.nest, so that it's possible to build the rest on top of it if needed.
(Note: you might want to also check my other related comment about the name
attribute, since I use a very similar use case as base for my examples.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SavedModel.signatures
intentionally flattens nested structures, because it needs to interoperate with non-Python interfaces (e.g. c++). But if you just call the restored function on the SavedModel
directly (rather than via the signatures
dictionary), then you'll get proper handling of both nested values and extension types. I.e., in tensorflow/tensorflow#40501, if you change loaded_model.signatures['test'](inputs)
to loaded_model.test(inputs)
, then it will handle both nested values and composite tensors (aka extension types) appropriately.
I do have a preliminary design for ExtensionTypes and TypeSpecs in c++ (there's a short blurb in the "Future Work" section), but I'm not including it in this RFC since there are still some details to work out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Thanks!
type specifications are generally not named. However, feedback on this question | ||
is welcome, especially when grounded in concrete use cases. (Does | ||
`TensorSpec.name` get used for other things than concrete signature argument | ||
naming?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have the feeling that they should not use a name
attribute. Rather, it seems to me like name
is not a good solution for the problem it's trying to solve, and that perhaps we should use this RFC as inspiration to find a better one.
Let me show a few examples of why name
can make things quite confusing for structured data. These examples are about exported signatures in SavedModel
, how they change before being saved and after being loaded, and how name
affects the result. I understand that part of the problem here is probably in how SavedModel
handles structured data and that name
can have other uses. Still, even if this is not necessarily generalizable, I think it can help as an example.
For this, let's define a simple named tuple named Pair
and imagine a hypothetical model function called test
that gets one as input.
Pair = namedtuple('Pair', ('x', 'y'))
Let's start with a simple case where we export a signature with no names in its specs. We name our input pair inputs
.
# Define specs, export, and load.
specs = Pair(tf.TensorSpec([], tf.int32), tf.TensorSpec([], tf.int32))
signatures = {'test': model.test.get_concrete_function(inputs=specs)}
tf.saved_model.save(model, 'test_model', signatures=signatures)
loaded_model = tf.saved_model.load('test_model')
# The inputs structure we're supposed to be exporting.
print(signatures['test'].structured_input_signature)
((Pair(x=TensorSpec(shape=(), dtype=tf.int32, name='inputs/x'), y=TensorSpec(shape=(), dtype=tf.int32, name='inputs/y')),), {})
# The inputs structure from the loaded model.
print(loaded_model.signatures['test'].structured_input_signature)
((), {'inputs_1': TensorSpec(shape=(), dtype=tf.int32, name='inputs_1'), 'inputs': TensorSpec(shape=(), dtype=tf.int32, name='inputs')})
The first issue we get is that the use of name
is inconsistent between signatures in the original and in the loaded model. Note how one is using the argument name inputs
and the structure fields x
and y
, and the other is just using the argument name for all its flattened contents and disambiguating repetitions. This, combined with the fact that position arguments have been converted to keyword ones and that keyword argument dicts have no reliable order, makes recovering structured data trickier and error-prone.
We can try then using the name
attribute to help sorting out this. Let's give a name to only one of the tensor specs.
specs = Pair(tf.TensorSpec([], tf.int32, name='x'), tf.TensorSpec([], tf.int32))
# [same export, save and load as before] ...
print(signatures['test'].structured_input_signature)
((Pair(x=TensorSpec(shape=(), dtype=tf.int32, name='x'), y=TensorSpec(shape=(), dtype=tf.int32, name='inputs/y')),), {})
print(loaded_model.signatures['test'].structured_input_signature)
((), {'inputs': TensorSpec(shape=(), dtype=tf.int32, name='inputs'), 'x': TensorSpec(shape=(), dtype=tf.int32, name='x')})
Now the presence of a name
is preventing to use the input argument name inputs
on the first field. However, this is also dropping the structured inputs/x
in favor of just x
in the original model. In the loaded one it sort of helps because automatic disambiguation is no longer needed, yet the other non-named spec item gets the argument name. This also means that partially named specs will change the automatically generated disambiguated names of the rest of specs in a nested structure.
Finally, let's show what happens if we have both names.
specs = Pair(tf.TensorSpec([], tf.int32, name='x'), tf.TensorSpec([], tf.int32, name='y'))
# [same export, save and load as before] ...
print(signatures['test'].structured_input_signature)
((Pair(x=TensorSpec(shape=(), dtype=tf.int32, name='x'), y=TensorSpec(shape=(), dtype=tf.int32, name='y')),), {})
print(loaded_model.signatures['test'].structured_input_signature)
((), {'x': TensorSpec(shape=(), dtype=tf.int32, name='x'), 'y': TensorSpec(shape=(), dtype=tf.int32, name='y')})
With both names we no longer have disambiguation. However, note that if we were to have a Pair
nesting two other Pair
structures, disambiguation would kick in again because x
and y
would repeat. This is because name
is lacking structural information.
So, my point with these examples is just to show some of the cases where the use of the name
attribute can get things quite messy. Perhaps this is a good time to think about possible better alternatives that could remove the need of such argument.
* The CompositeTensor base class is replaced with an `ExtensionType` protocol. | ||
|
||
* `CompositeTensor._type_spec` is renamed to `ExtensionType.__tf_type_spec__`, | ||
and is changed from an abstractproperty to an abstractmethod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will regular tensors, apart from other tensor-like and composite tensor objects, also use the ExtensionType
protocol? I'm asking because it would allow doing things like:
specs = tf.nest.map_structure(lambda t: t.__tf_type_spec__(), some_nested_structure)
This can help when doing things such as building function output signatures for tf.map_fn
and other cases when we want to easily build structured nested specs from data.
Edit: reformulated the question to make clearer what I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Tensor
will define __tf_type_spec__
, and ExtensionType
is replacing CompositeTensor
.
`TypeSpecs` using `tf.register_type_spec`. | ||
|
||
```python | ||
def register_type_spec(type_spec_subclass, name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be neat to have as a class decorator. It would link the otherwise decoupled declaration and further registration of the class. (Also could be used to build a centralized list of all declarations at compile/build time in a relatively ease way, but not sure about that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- I think this should be a decorator. (Since the common case is that name
is None, the only thing we really need to do to make it a class decorator is have it return type_spec_subclass
rather than None
.)
raise NotImplementedError | ||
``` | ||
|
||
**Note:** `tf.ExtensionType` is a Python `Protocol`, so it does not need to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that typing.Protocol is new in Python 3.8. I presume you'll want to preserve some degree of backwards compatibility older versions of Python? Is so, how is that handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will define an tf.is_extension_type
method which functions appropriately in all versions of Python.
|
||
**Note:** The `to_boxed_tensor` and `from_boxed_tensor` methods are typically | ||
implemented by defining new c++ Kernels that encodes values using tensors with | ||
`dtype=tf.variant`. The gradient for `to_boxed_tensor` typically calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to figure out if/how the RFC changes the SavedModel protobuf representation, and how does it affect the downstream compilation stack in the future.
Does this mean all the extension types will be DT_VARIENT
in the protobuf representation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extension types (aka composite tensors) are already handled by SavedModel. E.g., you can use SavedModel to save/load models that take SparseTensor or RaggedTensor inputs, or return SparseTensor or RaggedTensor outputs.
In particular, SavedObjectGraph
contains a set of SavedConcreteFunction
s. Those SavedConcreteFunctions
use StructuredValue
to record their input & output signatures. StructuredValue
(defined in tensorflow/core/protobuf/struct.proto) has a TypeSpecProto
message that's used to encode TypeSpecs
. Currently, it has an enum (TypeSpecClass
) which indicates the actual type; we will be adding an "EXTENSION_TYPE_SPEC" value to that enum, and a string field to TypeSpecProto
, which will be used to look up the appropriate TypeSpec in the registry.
When a saved concrete function is called, the TypeSpec
s that were saved in the SavedGraphObject
are used to decompose the arguments into component tensors, which are then fed into the saved concrete graph. If the result is an extension type, then the TypeSpec
in SavedConcreteFunction.output_signature
is used to pack the component tensors generated by the saved concrete graph into an appropriate extension type.
types and designs. If general-purpose types are developed that become | ||
sufficiently mature, we may consider bringing them into the main TensorFlow code | ||
base. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you seen this?
Looks like an already existing example of what this RFC is addressing, would be interesting to see the details of this usecase.
This has been accepted at the design review but waiting for the notes to be posted before merging. |
Notes from design review Which class should decompose/reconstruct values? In an alternative design, we could move the job of decomposing values into tensors & reconstructing them from tensors from Registry vs. Protocol vs. Base Class - what should we use to define extension types? The current RFC proposes that the API for defining TensorFlow extension types be implemented using protocols. Two alternative APIs that we considered are base classes and a registration mechanism. Decision was to stick with a protocol (status quo). Should it be possible to define extension types non-invasively? The current RFC does not allow types to be turned into extension types "non-invasively" -- i.e., you can't take a type that you don't own, and make it an extension type (e.g. by registering it). It was agreed that this wasn't a desirable feature. Should Typespec have a name field? The current RFC does not add a "name" as part of the TypeSpec class. It was agreed that load-bearing names aren't a good idea. Decision: no names. Should user-defined types be supported in gradients?
Should new symbols (e.g. tf.TypeSpec) be added in an "experimental" namespace first? Decision: yes (e.g. tf.experimental.TypeSpec) How would external libraries (Keras) support overrides/dispatch?
Questions on decorator to declare something supports dispatching:
Long-term question: In the future, does it make sense to allow type extensions all the way down to the runtime?
What about Variant tensors?
What about types that do not naturally map to Tensors? Have we thought about exposing other types to the runtime?
Decision: RFC approved. |
…nsorFlow Extension Types](tensorflow/community#269). PiperOrigin-RevId: 338353976 Change-Id: I7f155854973fb496c5c0f43eb24a46aa7418c8d6
Objective
This RFC proposes a protocol that can be used to define user-defined
object-oriented Python types that are supported by TensorFlow APIs.