Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Conversation

lzr-official
Copy link
Member

@lzr-official lzr-official commented Nov 6, 2019

Review period open through 2019-11-23

TPU SavedModel Export API for TF2.x

Status Proposed
Author(s) Zhuoran Liu ([email protected]), Youlong Cheng ([email protected])
Sponsor Jonathan Hseu ([email protected])
Updated 2019-11-06

Objective

Provide an API to allow TF2 users to export TPU saved models for
inference
, which:

  • Provide a user-friendly way to specify which function to run on TPU;
  • Hides Graph construction and TPU inference specific logic (multi-core
    support, etc) from users;
  • Allows specifying tags in SavedModel.

Motivation

Limitation of current tf.saved_model.save()

MetaGraphDef allows saving customized tags. Current downstream components like
TPU model-server, TFX infra-validator use the tags to load the specific
MetaGraph. However tf.saved_model.save() does not allow users to specify the set
of tags in MetaGraphDef, but hard-coded the MetaGraph to have only one ‘serve’
tag.

Copy link

@karmel karmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this in a way that doesn't require peppering tpu kwargs throughout existing APIs?

tf.keras.models.save_model(
model,
filepath='...',
export_to_tpu=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid having tpu-specific kwargs added to the Keras paths. Let's think of a better way to do this? Can it be controlled from the dist strat side, where we have TPU-awareness? CC @fchollet , @k-w-w .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's possible on the dist strat side, I prefer this idea over adding new arguments to the function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 hardcoding one particular architecture into save_model seems unfortunate and seems to be encoding device specific information at a very high level (also then why not a export_to_gpu, export_to_npu, export_to_ipu, ...)

one MetaGraph, which has ‘serve’ tag hard-coded.

`tags` is an optional argument. It is a Python iterable, representing the
list of tags for MetaGraph. This allows user to specify customized tags.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this is necessary? If the joint tags are being assigned to a single metagraph... why bother? Shouldn't the signatures be sufficient? I would hope that tags can fade away in 2.0, because they are confusing.

passed through to the place where PartitionedCallOp is created. Originally
all stateful functions will generate StatefulPartitionedCallOp. Now we
switch to TPUPartitionedCallOp, and this routing is done by checking the
value of `use_tpu_partitioned_call`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the details of TPU partitioned calls didn't leak all the way up to the tf.function interface. Can we think of a better way? CC @jaingaurav

1. `export_to_tpu`: Simply setting this to `True` will export TPU model;
2. `tags_signatures`: Optionally for advanced users, if they want to have more
control of what tags / signatures they are using, they can use this argument
as if they are using TF2.x saving API.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe (2) was removed, but (1) is still too much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry 2 was a mistake. It should be tags.
For 1, what suggestions do you have on how to let this method know we are exporting to TPU model?

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: no and removed cla: yes labels Nov 9, 2019
@ewilderj ewilderj changed the title TPU SavedModel Export API for TF2 - RFC RFC: TPU SavedModel Export API for TF2 Nov 9, 2019
@ewilderj ewilderj added cla: yes RFC: Proposed RFC Design Document and removed cla: no labels Nov 9, 2019
@googlebot
Copy link

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

ℹ️ Googlers: Go here for more info.

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: no and removed cla: yes labels Nov 9, 2019
@brijk7
Copy link
Contributor

brijk7 commented Dec 5, 2019

@ewilderj : what are the next steps on this RFC? Looks like the comment period is over, and the design review is done as well.

@ewilderj
Copy link
Contributor

ewilderj commented Dec 5, 2019

Public design review meeting notes should be posted to this PR, and the PR updated to change status to "Approved" (assuming it was approved.) Once that's done, can be merged.

@brijk7
Copy link
Contributor

brijk7 commented Dec 7, 2019

@lzr-google : can you please provide an update from the design review?

@lzr-official
Copy link
Member Author

Hi, we didn't reach full agreement on the proposed design in its current state. More work will be needed before we have a conclusion.
Thanks!

Since this API change has been approved and checked in, I now update this doc with the final accepted design and switch status to 'accepted'.
@lzr-official lzr-official requested a review from jhseu February 5, 2020 23:26
@lzr-official
Copy link
Member Author

Update on the status of this RFC: After changing the design according to review feedback, the new design as in the latest commit has been accepted, with change checked in at tensorflow/tensorflow@ee1dcbb .

@ematejska ematejska merged commit 15fab62 into tensorflow:master Feb 5, 2020
@ematejska ematejska added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Feb 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: no RFC: Accepted RFC Design Document: Accepted by Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants