RFC: TensorFlow Extension Types #269

edloper · 2020-07-21T17:43:52Z

Status	Proposed
Authors	Edward Loper ([email protected])
Sponsor	Alex Passos ([email protected])
Updated	2020-07-21

Objective

This RFC proposes a protocol that can be used to define user-defined
object-oriented Python types that are supported by TensorFlow APIs.

…case.

… to variables.

Merge changes from base

shoyer · 2020-07-23T05:22:37Z

rfcs/20200721-extension-types.md

+* `DimensionAlignment`: Encodes a correspondence between two related dimensions
+  (e.g., between a `word` dimension and a `speaker` dimension).
+
+**Tensor-like types:**


Another obvious example would be some form of distributed tensor, like those from Mesh-TensorFlow.

shoyer · 2020-07-23T05:33:00Z

rfcs/20200721-extension-types.md

+* **Keras**: User-defined types can be used as inputs and outputs for Keras
+  `Models` and `Layers`.


How would a library like Keras support overrides of TensorFlow operations?

There are basically two alternatives:

It could allow the behavior of Keras objects (e.g., Layer instances) to be overriden explicitly via some mechanism like __tf_dispatch__.

It could allow the behavior of Keras objects to be overriden implicitly based on their underlying implementation in terms of TensorFlow operations.

The main challenge with (1) is that objects are (in general) stateful, so the override mechanism described here for functions would not suffice.

In contrast, (2) seems initially appealing. Keras is defined on arbitrary extension types "for free". The problem is that now Keras' implementation in terms of lower level TensorFlow function is exposed as part of its public API: if the underlying implementation of a Keras layer changes, it could break extension types, which might not implement a function, or worse might implement it in an incompatible way.

This is an excellent question. I would lean towards approach (2), but you're right to point out that this adds an element of design risk when using a Keras layer with an extension type. Perhaps we can start with approach (2), but give appropriate warnings/disclaimers about the problem; and evaluate over the next few months whether this is a significant practical problem or not.

To be clear, there are at least a few other options:

Keras doesn't (yet) support TF extension types (e.g., by always explicitly converting inputs with tf.convert_to_tensor).

Keras only supports TF extension types if an experimental flag is explicitly set. This flag might even be permanently required, if the Keras team doesn't want their implementation details to be considered public API.

I don't think shipping (2) and re-evaluating in a few months is a great option, because the costs of expanded API surface take a long time to become self-evident (i.e., at least a few major releases for users to start depending on a behavior and complain when it breaks). The point at which we notice that it's a problem could well be too late.

There's also examples of this object-oriented dispatch question in the core TF APIs themselves (e.g. for the various tf.linalg.LinearOperator APIs)

The tricky thing is these may not provide any sort of canonical way to serialize/deserialize traced dispatched operations, because they can't be mapped to/from strings directly at the top-level api.

shoyer · 2020-07-23T05:49:40Z

rfcs/20200721-extension-types.md

+
+  # We support ops that take tf.Tensor and SimpleMaskedTensor arguments.  We
+  # don't support any other dispatchable argument types (such as tf.RaggedTensor).
+  __tf_dispatch_types__ = (tf.Tensor, SimpleMaskedTensor)


What about builtin numeric types, like int or float. Would these need to be included explicitly in __tf_dispatch_types__?

In the current design, arguments that don't define __tf_dispatch__ will be converted to Tensor before dispatch is called. So ints, floats, np.arrays, etc., will get converted to Tensor.

leandro-gracia-gil

Thank you very much for this RFC. I love the ideas suggested here.

leandro-gracia-gil · 2020-07-23T06:14:47Z

rfcs/20200721-extension-types.md

+* **Tensorflow hub**: User-defined types can be used as inputs and outputs for
+  `tf.hub` modules.
+* **SavedModel**: User-defined types can be used as inputs and outputs for
+  `SavedModels`.


How would this RFC affect the use of exported signatures from loaded saved models?

Currently, the behavior for structured data seems to be pretty poor. Input and output nested structures are flattened (not always automatically, sometimes you need to do it yourself) and need to be accessed through keyword arguments by guessing the automatically generated tensor names that TF assigns to them. Not only that, but the naming used is not even consistent between inputs and outputs.

I explained these issues in more detail here and proposed an improvement, though this was only for nested structures. I imagine that supporting Python-based extension types in loaded signatures is either a non-goal or needs to C++ support too, but perhaps this is a good time to at least provide an interface that works nicely with tf.nest, so that it's possible to build the rest on top of it if needed.

(Note: you might want to also check my other related comment about the name attribute, since I use a very similar use case as base for my examples.)

SavedModel.signatures intentionally flattens nested structures, because it needs to interoperate with non-Python interfaces (e.g. c++). But if you just call the restored function on the SavedModel directly (rather than via the signatures dictionary), then you'll get proper handling of both nested values and extension types. I.e., in tensorflow/tensorflow#40501, if you change loaded_model.signatures['test'](inputs) to loaded_model.test(inputs), then it will handle both nested values and composite tensors (aka extension types) appropriately.

I do have a preliminary design for ExtensionTypes and TypeSpecs in c++ (there's a short blurb in the "Future Work" section), but I'm not including it in this RFC since there are still some details to work out.

Understood. Thanks!

leandro-gracia-gil · 2020-07-23T07:09:56Z

rfcs/20200721-extension-types.md

+type specifications are generally not named.  However, feedback on this question
+is welcome, especially when grounded in concrete use cases.  (Does
+`TensorSpec.name` get used for other things than concrete signature argument
+naming?)


I also have the feeling that they should not use a name attribute. Rather, it seems to me like name is not a good solution for the problem it's trying to solve, and that perhaps we should use this RFC as inspiration to find a better one.

Let me show a few examples of why name can make things quite confusing for structured data. These examples are about exported signatures in SavedModel, how they change before being saved and after being loaded, and how name affects the result. I understand that part of the problem here is probably in how SavedModel handles structured data and that name can have other uses. Still, even if this is not necessarily generalizable, I think it can help as an example.

For this, let's define a simple named tuple named Pair and imagine a hypothetical model function called test that gets one as input.

Pair = namedtuple('Pair', ('x', 'y'))

Let's start with a simple case where we export a signature with no names in its specs. We name our input pair inputs.

# Define specs, export, and load. specs = Pair(tf.TensorSpec([], tf.int32), tf.TensorSpec([], tf.int32)) signatures = {'test': model.test.get_concrete_function(inputs=specs)} tf.saved_model.save(model, 'test_model', signatures=signatures) loaded_model = tf.saved_model.load('test_model') # The inputs structure we're supposed to be exporting. print(signatures['test'].structured_input_signature) ((Pair(x=TensorSpec(shape=(), dtype=tf.int32, name='inputs/x'), y=TensorSpec(shape=(), dtype=tf.int32, name='inputs/y')),), {}) # The inputs structure from the loaded model. print(loaded_model.signatures['test'].structured_input_signature) ((), {'inputs_1': TensorSpec(shape=(), dtype=tf.int32, name='inputs_1'), 'inputs': TensorSpec(shape=(), dtype=tf.int32, name='inputs')})

The first issue we get is that the use of name is inconsistent between signatures in the original and in the loaded model. Note how one is using the argument name inputs and the structure fields x and y, and the other is just using the argument name for all its flattened contents and disambiguating repetitions. This, combined with the fact that position arguments have been converted to keyword ones and that keyword argument dicts have no reliable order, makes recovering structured data trickier and error-prone.

We can try then using the name attribute to help sorting out this. Let's give a name to only one of the tensor specs.

specs = Pair(tf.TensorSpec([], tf.int32, name='x'), tf.TensorSpec([], tf.int32)) # [same export, save and load as before] ... print(signatures['test'].structured_input_signature) ((Pair(x=TensorSpec(shape=(), dtype=tf.int32, name='x'), y=TensorSpec(shape=(), dtype=tf.int32, name='inputs/y')),), {}) print(loaded_model.signatures['test'].structured_input_signature) ((), {'inputs': TensorSpec(shape=(), dtype=tf.int32, name='inputs'), 'x': TensorSpec(shape=(), dtype=tf.int32, name='x')})

Now the presence of a name is preventing to use the input argument name inputs on the first field. However, this is also dropping the structured inputs/x in favor of just x in the original model. In the loaded one it sort of helps because automatic disambiguation is no longer needed, yet the other non-named spec item gets the argument name. This also means that partially named specs will change the automatically generated disambiguated names of the rest of specs in a nested structure.

Finally, let's show what happens if we have both names.

specs = Pair(tf.TensorSpec([], tf.int32, name='x'), tf.TensorSpec([], tf.int32, name='y')) # [same export, save and load as before] ... print(signatures['test'].structured_input_signature) ((Pair(x=TensorSpec(shape=(), dtype=tf.int32, name='x'), y=TensorSpec(shape=(), dtype=tf.int32, name='y')),), {}) print(loaded_model.signatures['test'].structured_input_signature) ((), {'x': TensorSpec(shape=(), dtype=tf.int32, name='x'), 'y': TensorSpec(shape=(), dtype=tf.int32, name='y')})

With both names we no longer have disambiguation. However, note that if we were to have a Pair nesting two other Pair structures, disambiguation would kick in again because x and y would repeat. This is because name is lacking structural information.

So, my point with these examples is just to show some of the cases where the use of the name attribute can get things quite messy. Perhaps this is a good time to think about possible better alternatives that could remove the need of such argument.

leandro-gracia-gil · 2020-07-23T07:23:13Z

rfcs/20200721-extension-types.md

+* The CompositeTensor base class is replaced with an `ExtensionType` protocol.
+
+* `CompositeTensor._type_spec` is renamed to `ExtensionType.__tf_type_spec__`,
+  and is changed from an abstractproperty to an abstractmethod.


Will regular tensors, apart from other tensor-like and composite tensor objects, also use the ExtensionType protocol? I'm asking because it would allow doing things like:

specs = tf.nest.map_structure(lambda t: t.__tf_type_spec__(), some_nested_structure)

This can help when doing things such as building function output signatures for tf.map_fn and other cases when we want to easily build structured nested specs from data.

Edit: reformulated the question to make clearer what I mean.

Yes. Tensor will define __tf_type_spec__, and ExtensionType is replacing CompositeTensor.

lithuak · 2020-07-24T09:43:47Z

rfcs/20200721-extension-types.md

+`TypeSpecs` using `tf.register_type_spec`.
+
+```python
+def register_type_spec(type_spec_subclass, name=None):


This would be neat to have as a class decorator. It would link the otherwise decoupled declaration and further registration of the class. (Also could be used to build a centralized list of all declarations at compile/build time in a relatively ease way, but not sure about that).

Agreed -- I think this should be a decorator. (Since the common case is that name is None, the only thing we really need to do to make it a class decorator is have it return type_spec_subclass rather than None.)

shoyer · 2020-07-29T21:04:11Z

rfcs/20200721-extension-types.md

+    raise NotImplementedError
+```
+
+**Note:** `tf.ExtensionType` is a Python `Protocol`, so it does not need to be


Note that typing.Protocol is new in Python 3.8. I presume you'll want to preserve some degree of backwards compatibility older versions of Python? Is so, how is that handled?

We will define an tf.is_extension_type method which functions appropriately in all versions of Python.

miaout17 · 2020-08-02T21:09:35Z

rfcs/20200721-extension-types.md

+
+**Note:** The `to_boxed_tensor` and `from_boxed_tensor` methods are typically
+implemented by defining new c++ Kernels that encodes values using tensors with
+`dtype=tf.variant`.  The gradient for `to_boxed_tensor` typically calls


I'm trying to figure out if/how the RFC changes the SavedModel protobuf representation, and how does it affect the downstream compilation stack in the future.

Does this mean all the extension types will be DT_VARIENT in the protobuf representation?

Extension types (aka composite tensors) are already handled by SavedModel. E.g., you can use SavedModel to save/load models that take SparseTensor or RaggedTensor inputs, or return SparseTensor or RaggedTensor outputs.

In particular, SavedObjectGraph contains a set of SavedConcreteFunctions. Those SavedConcreteFunctions use StructuredValue to record their input & output signatures. StructuredValue (defined in tensorflow/core/protobuf/struct.proto) has a TypeSpecProto message that's used to encode TypeSpecs. Currently, it has an enum (TypeSpecClass) which indicates the actual type; we will be adding an "EXTENSION_TYPE_SPEC" value to that enum, and a string field to TypeSpecProto, which will be used to look up the appropriate TypeSpec in the registry.

When a saved concrete function is called, the TypeSpecs that were saved in the SavedGraphObject are used to decompose the arguments into component tensors, which are then fed into the saved concrete graph. If the result is an extension type, then the TypeSpec in SavedConcreteFunction.output_signature is used to pack the component tensors generated by the saved concrete graph into an appropriate extension type.

lithuak · 2020-08-04T16:24:53Z

rfcs/20200721-extension-types.md

+types and designs.  If general-purpose types are developed that become
+sufficiently mature, we may consider bringing them into the main TensorFlow code
+base.
+


Have you seen this?

https://github.com/tensorflow/probability/blob/b3ea633bdf9b2dbbbe4fcb82b0dd805cb9f284f8/tensorflow_probability/python/util/deferred_tensor.py#L81

https://github.com/tensorflow/probability/blob/b3ea633bdf9b2dbbbe4fcb82b0dd805cb9f284f8/tensorflow_probability/python/util/deferred_tensor.py#L117

Looks like an already existing example of what this RFC is addressing, would be interesting to see the details of this usecase.

ematejska · 2020-08-04T18:33:17Z

This has been accepted at the design review but waiting for the notes to be posted before merging.

edloper · 2020-08-11T20:17:33Z

Notes from design review

Which class should decompose/reconstruct values? In an alternative design, we could move the job of decomposing values into tensors & reconstructing them from tensors from TypeSpec to ExtensionType. This decision has no impact on the functionality/capabilities of the design, only on its ergonomics. apassos@google expressed a preference for the current design. edloper@ agreed that if there are no strong opinions, then we should stick with the current design. Decision: status quo design.

Registry vs. Protocol vs. Base Class - what should we use to define extension types? The current RFC proposes that the API for defining TensorFlow extension types be implemented using protocols. Two alternative APIs that we considered are base classes and a registration mechanism. Decision was to stick with a protocol (status quo).

Should it be possible to define extension types non-invasively? The current RFC does not allow types to be turned into extension types "non-invasively" -- i.e., you can't take a type that you don't own, and make it an extension type (e.g. by registering it). It was agreed that this wasn't a desirable feature.

Should Typespec have a name field? The current RFC does not add a "name" as part of the TypeSpec class. It was agreed that load-bearing names aren't a good idea. Decision: no names.

Should user-defined types be supported in gradients?

As inputs, yes. This should just work with very little effort?
If these are purely container types, then this should definitely just work…
This does mean that specific implementations used in dispatch may need to add differentiability.
Outputs can be made to work via tf.ones and dispatch.

Should new symbols (e.g. tf.TypeSpec) be added in an "experimental" namespace first? Decision: yes (e.g. tf.experimental.TypeSpec)

How would external libraries (Keras) support overrides/dispatch?

On some level, extension type owners need to own this issue.
Library owners need to test this as well. E.g. Keras layers can explicitly indicate that they support certain extension types.
Most layers can just rely on overriding ops.
A library like addons would need unit tests to declare what types to support.
What mechanism do we provide addons to support dispatch?
- We need to allow others to use the dispatch decorator.

Questions on decorator to declare something supports dispatching:

If this is something that third parties can use, should we have a string addition?
- Right now K uses tf.export’d name
- What is the canonical string name for ‘a thing that supports dispatching’?
  - package_name.symbol?
What about instance methods? Like linear operators. How do these support dispatch?
- We should call the dispatcher with all args, including ‘self’.
- Dispatch already supports dispatch-to-multiple-things anyways.
What args do we want to dispatch on?
- Currently only on args that expect Tensor, ListTensor
- Not on attr, nest-of-tensors.
- Currently only needed for ‘slice’, and that’s already handled via dispatch on __getitem__ with unpacking dispatcher.
  - This is due to loss of static shape information needed for TPUs.
- We will need to fix certain code that expects to handle static values as integers.

Long-term question: In the future, does it make sense to allow type extensions all the way down to the runtime?

We already implement certain things as Tensor subclasses.
Long-term, more things should be encoded in the graph.
- For composites, we are missing op composition. Python allows us to get parallelism, and specificity on what part(s) of the composite we want to touch.
- As we get better with composition, it might make more sense to move that down a bit.
- We should aim to have more of these types deeper in the stack -- the compiler is losing information right now.

What about Variant tensors?

Example was not Variant, but Optional.
Variant allows more preservation of internal data but you have to implement everything
You get different optimization opportunities in different contexts. Exploded optimization lose RT data, but you get common subexpression elimination; variants don’t get this.

What about types that do not naturally map to Tensors? Have we thought about exposing other types to the runtime?

That would likely be useful.

Decision: RFC approved.

…nsorFlow Extension Types](tensorflow/community#269). PiperOrigin-RevId: 338353976 Change-Id: I7f155854973fb496c5c0f43eb24a46aa7418c8d6

edloper added 10 commits September 11, 2019 15:28

RFC: StructTensor

5b4e484

markup fixes

bca8fa6

Renamed StructTensor->StructuredTensor

89506c0

Updated Objective section to explicitly call out the tabular special …

138c9c4

…case.

Set status to accepted; Added section on how StructuredTensor relates…

b9e8052

… to variables.

Merge pull request #1 from tensorflow/master

9da14eb

Merge changes from base

Adding RFC: TF Extension Types

04b74bf

Updated markdown to enable python code formatting.

b365f36

markup fixes

38a0815

markup fixes

0ecbf50

edloper requested review from ematejska, ewilderj, martinwicke and theadactyl as code owners July 21, 2020 17:43

googlebot added the cla: yes label Jul 21, 2020

ematejska added the RFC: Proposed RFC Design Document label Jul 22, 2020

bhack mentioned this pull request Jul 22, 2020

Align Typing with Exported TF Types tensorflow/addons#1997

Closed

shoyer reviewed Jul 23, 2020

View reviewed changes

leandro-gracia-gil reviewed Jul 23, 2020

View reviewed changes

lithuak approved these changes Jul 24, 2020

View reviewed changes

shoyer reviewed Jul 29, 2020

View reviewed changes

miaout17 reviewed Aug 2, 2020

View reviewed changes

lithuak reviewed Aug 4, 2020

View reviewed changes

ematejska added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Aug 4, 2020

Update 20200721-extension-types.md

c18f4f6

edloper added 2 commits August 11, 2020 15:59

minor updates; marked RFC as accepted

5622ad5

Merge branch 'master' of github.com:edloper/community

02c344e

ematejska approved these changes Aug 13, 2020

View reviewed changes

ematejska merged commit 5d4f70e into tensorflow:master Aug 13, 2020

shoyer mentioned this pull request Aug 19, 2020

Proposal to standardize element-wise arithmetic operations data-apis/array-api#9

Closed

		* Keras: User-defined types can be used as inputs and outputs for Keras
		`Models` and `Layers`.

RFC: TensorFlow Extension Types #269

RFC: TensorFlow Extension Types #269

Uh oh!

Conversation

edloper commented Jul 21, 2020 • edited by ematejska Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leandro-gracia-gil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leandro-gracia-gil Jul 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ematejska commented Aug 4, 2020

Uh oh!

edloper commented Aug 11, 2020

Uh oh!

Uh oh!

edloper commented Jul 21, 2020 •

edited by ematejska

Loading

leandro-gracia-gil Jul 23, 2020 •

edited

Loading