Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

Conversation

pyu10055
Copy link
Contributor

@pyu10055 pyu10055 commented Jul 1, 2019

It is a command tool that guides the user through the converter options.
Here is an updated video for the cli.


This change is Reviewable

@pyu10055 pyu10055 requested review from nsthorat and caisq July 1, 2019 20:52
@nsthorat
Copy link

nsthorat commented Jul 1, 2019

Ping, amazing work! This is incredible. Few high level points:

  • I wonder if we should have this in a small design doc because it would give more people an opportunity to look at this (a lot of people have used the converter now!)
  • Some of the things it would be great to do automatically! Like can we detect the input format and print "We found a TensorFlow SavedModel!"
  • It's unclear what "tags" are, can we automatically print them all?
  • What is a signature? Can we print that too?
  • When you ask about compression it would be great to show the options with details (e.g. "1) 4x compression, highest accuracy loss 2) 2x compression, some accuracy loss)
  • It would be good to give a little explainer for each of the questions, e.g. What is op validation? What does stripping debug ops do?

@nsthorat nsthorat requested a review from dsmilkov July 1, 2019 21:55
Copy link
Contributor Author

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,

  1. will generate a design doc
  2. since different model requires difference things (saved model needs directory, keras needs a file), I would think let the user choose the format first would be better. (the cli does create different questions according to the input format selected).
  3. tag/signature list are loaded from the saved model, and shown as options, users just need to select one.
  4. agree, will add
  5. will do

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @nsthorat)

@nsthorat
Copy link

nsthorat commented Jul 2, 2019

Let's talk over there! I think we still should detect the input format :)

Copy link
Contributor Author

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the workflow of the CLI according to the comments, and a new video is uploaded, https://drive.google.com/open?id=1hWabsvY4d3GdDJxUlBwMiBuawIL1ulfI

PTAL.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @nsthorat)

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first pass. Good work. A concern I have is about the third_party dependency.

Reviewed 2 of 7 files at r1, 1 of 3 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, @nsthorat, and @pyu10055)


python/requirements.txt, line 7 at r2 (raw file):

tensorflow==1.14.0
tensorflow-hub==0.5.0
PyInquirer==1.0.3

this is a bit problematic since the library is not in third_party yet and we want to sync the python converter internally. Let's sync it to third_party before moving forward, so we know it's possible. Otherwise, how about using Prompt-Toolkit (v2) which is already in third party: https://github.com/prompt-toolkit/python-prompt-toolkit/tree/2.0


python/tensorflowjs/cli.py, line 24 at r2 (raw file):

from PyInquirer import prompt
from examples import custom_style_3

where does the examples namespace come from?


python/tensorflowjs/cli.py, line 27 at r2 (raw file):

from tensorflow.python.saved_model.loader_impl import parse_saved_model
from tensorflow.core.framework import types_pb2
from tensorflowjs.converters.converter import main as convert

not a big fan of aliasing since it's harder to follow which method is actually called. let's rename main to convert in converter.py. See comment below.


python/tensorflowjs/cli.py, line 42 at r2 (raw file):

def quantization_type(value):

s/value/user_selection_quant/ for readability


python/tensorflowjs/cli.py, line 45 at r2 (raw file):

  """Determine the quantization type based on user selection.
  Args:
    value: user selected value.

add returns section


python/tensorflowjs/cli.py, line 58 at r2 (raw file):

def of_values(answers, key, values):

the name hides a bit what this function does. Let's call it value_in_list(value, list) and do the dict lookup outside to make the code more readable: value_in_list(answers[key], valid_answers)


python/tensorflowjs/cli.py, line 72 at r2 (raw file):

def getTFJSModelType(file):

snake case?


python/tensorflowjs/cli.py, line 78 at r2 (raw file):

def detect_input_format(answers):

instead of storing a temporary variable inside answers[DETECTED_INPUT_FORMAT], let's have this method return the detected_input_format and normalized_input_path as a result. Then pass those results where you need it. This way you don't need to ignore "special answers" when you generate flags later.
Do the overriding of answers[input_path] = normalized_input_path outside this method.

Also have input_path as the only input to this function - easier to understand what this function actually needs.


python/tensorflowjs/cli.py, line 107 at r2 (raw file):

  """Determine question for model's input path.
  Args:
    answer: Dict of user's answers to the questions

this method signature is a bit misleading since you do not need the whole dictionary. Pass the minimal info this function needs. In this case input_format


python/tensorflowjs/cli.py, line 119 at r2 (raw file):

def validate_input_path(value, input_format):

s/value/input_path/


python/tensorflowjs/cli.py, line 130 at r2 (raw file):

  if input_format == TF_HUB:
    if re.match(URL_REGEX, value) is None:
      return 'This is not an valid url for TFHub module: %s' % value

Add more info to the end. "We expect a URL that starts with http(s)://"


python/tensorflowjs/cli.py, line 135 at r2 (raw file):

  if input_format in [KERAS_SAVED_MODEL, TF_SAVED_MODEL]:
    if not os.path.isdir(value):
      return 'The path provided is not a directory: %s' % value

let's be more flexible and work with a path to ".pb" as well, not just a directory


python/tensorflowjs/cli.py, line 137 at r2 (raw file):

      return 'The path provided is not a directory: %s' % value
    if not any(fname.endswith('.pb') for fname in os.listdir(value)):
      return 'This is an invalid saved model directory: %s' % value

let's make the error more concrete: Did not find a .pb file inside the model's directory" (also make sure you remove "saved model" since this is also used for keras.


python/tensorflowjs/cli.py, line 140 at r2 (raw file):

  if input_format == TFJS_LAYERS_MODEL:
    if not os.path.isfile(value):
      return 'The path provided is not a file: %s' % value

To be consistent with saved models, let's accept both a directory that has a json file inside, or a json file directly.


python/tensorflowjs/cli.py, line 151 at r2 (raw file):

  Args:
    value: input path of the model.
    input_format: model format string.

missing returns


python/tensorflowjs/cli.py, line 164 at r2 (raw file):

  """generate the tensorflowjs command string for the selected params.
  Args:
    params: user selected parameters for the conversion.

missing returns


python/tensorflowjs/cli.py, line 182 at r2 (raw file):

def is_saved_model(answers):

pass in minimal viable info, namely input_format and output_format.


python/tensorflowjs/cli.py, line 342 at r2 (raw file):

      'filter': os.path.expanduser,
      'validate':
          lambda value: 'Please enter a valid path' if not value else True

lambda path:


python/tensorflowjs/cli.py, line 454 at r2 (raw file):

    for entry in it:
      print(entry.name,
            os.path.getsize(os.path.join(params['output_path'], entry.name)))

the entry which is of type DirEntry already has a stat property on it, which has file size, among other info: https://docs.python.org/3/library/os.html#os.DirEntry.stat


python/tensorflowjs/converters/converter.py, line 495 at r2 (raw file):

    return parser.parse_args()

def main(arguments=None):

let's rename this function to convert, and add another main which calls it


python/tensorflowjs/converters/converter.py, line 561 at r2 (raw file):

  elif (input_format == 'tf_hub' and
        output_format == 'tfjs_graph_model'):
    print(FLAGS)

remove print statement

Copy link

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @nsthorat, and @pyu10055)


python/tensorflowjs/cli.py, line 29 at r2 (raw file):

from tensorflowjs.converters.converter import main as convert
# regex for recognizing valid url for TFHub module.
URL_REGEX = re.compile(

how about TFHUB_VALID_URL_REGEX

@caisq
Copy link
Contributor

caisq commented Jul 16, 2019

@pyu10055 It just occurred to me, but you might want to verify that PyInquirer works in notebook environments such as Jupyter Notebook and CoLab. From our experience, we know many of our users do their Python model training and subsequent conversion in notebooks. It would not be ideal if the interactive CLI doesn't work in those environments.

Copy link
Contributor Author

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision made offline:

  1. will not sync the interactive cli to g3
  2. will not try to support notebook or colab to run interactive cli directly, will add a dry-run mode to generate the command line.
  3. continue use pyInquirer library.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)


python/requirements.txt, line 7 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

this is a bit problematic since the library is not in third_party yet and we want to sync the python converter internally. Let's sync it to third_party before moving forward, so we know it's possible. Otherwise, how about using Prompt-Toolkit (v2) which is already in third party: https://github.com/prompt-toolkit/python-prompt-toolkit/tree/2.0

resolved offline see comment above.


python/tensorflowjs/cli.py, line 24 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

where does the examples namespace come from?

removed


python/tensorflowjs/cli.py, line 58 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

the name hides a bit what this function does. Let's call it value_in_list(value, list) and do the dict lookup outside to make the code more readable: value_in_list(answers[key], valid_answers)

thanks for the suggestion, one thing that this method also do is to check if key is a valid key, otherwise call need to do the checking, which could be quite some places.


python/tensorflowjs/cli.py, line 182 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

pass in minimal viable info, namely input_format and output_format.

done


python/tensorflowjs/cli.py, line 454 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

the entry which is of type DirEntry already has a stat property on it, which has file size, among other info: https://docs.python.org/3/library/os.html#os.DirEntry.stat

thanks

Copy link
Contributor

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)


README.md, line 60 at r3 (raw file):

- Regular conversion script: tensorflowjs_converter

State the fact that both come with pip install tensorflowjs.


README.md, line 72 at r3 (raw file):

r own script,

Style nit: respect 80-char line limit whenever possible.


python/tensorflowjs/init.py, line 24 at r3 (raw file):

from tensorflowjs import cli

Style: the import lines should be sorted lexicographically.


python/tensorflowjs/BUILD, line 137 at r3 (raw file):

"//tensorflowjs:expect_tensorflow_installed",

For consistency, you should add a new dummy dependency called epxect_PyInquirer_installed, following the pattern of expect_tensorflow_installed.


python/tensorflowjs/cli.py, line 24 at r3 (raw file):

from PyInquirer import prompt, style_from_dict, Token

Style: importing individual symbols are against Google Python style. Instead, import the entire module. See:
https://github.com/google/styleguide/blob/gh-pages/pyguide.md#224-decision


python/tensorflowjs/cli.py, line 37 at r3 (raw file):

# Token.Selected: '',  # default

Should this line be removed?


python/tensorflowjs/cli.py, line 45 at r3 (raw file):

})

KERAS_SAVED_MODEL = 'keras_saved_model'

Should converter.py use these constants as well for consistency and future-proofness? If you do that, perhaps these should be refactored into a shared small Python file.


python/tensorflowjs/cli.py, line 49 at r3 (raw file):

tfjs_keras_model

Should this be tfjs_graph_model?


python/tensorflowjs/cli.py, line 63 at r3 (raw file):

1/2'

Why not be consistent with the terminology we've used elsewhere, i.e., '8-bit' and '16-bit'? Introducing new terminology is confusing IMO.


python/tensorflowjs/cli.py, line 92 at r3 (raw file):

detect_

Some of the logic in the function are not nearly 100% bulletproof. So I think this function should be called "guess_input_format" instead.


python/tensorflowjs/cli.py, line 106 at r3 (raw file):

detected_input_format = TF_SAVED_MODEL

How reliable is this logic? Remember that keras H5 files are single files. So a user could have a file in a directory with many other files, and some of those other files may happen to be some unrelated protobuf (.pb) file. (Imagine someone's download folder). Is there a more unique signature of the .pb file from a TF SavedModel?


python/tensorflowjs/cli.py, line 117 at r3 (raw file):

'.hdf5'

The extension name for a HDF5 is often .h5. In fact, it can be any extension name. The HDF5 library doesn't require any specific extensions. I think you should try creating an HDF5 object in order to determine this condition with higher reliability.


python/tensorflowjs/cli.py, line 122 at r3 (raw file):

    elif (input_path.endswith('.json') and
          get_tfjs_model_type(input_path) == 'layers-model'):

You can try loading the .json file with Python's json module and check the fields such as generatedBy and convertedBy.


python/tensorflowjs/cli.py, line 123 at r3 (raw file):

detected_input_format = TFJS_LAYERS_MODEL

Should there be an else branch that throws an Error saying that it can't automatically guess the type of the input?


python/tensorflowjs/cli.py, line 144 at r3 (raw file):

v

nit: --> Upper case "V". Same below.


python/tensorflowjs/cli.py, line 153 at r3 (raw file):

re.match(TFHUB_VALID_URL_REGEX, path) is None

You can just do if not re.match(TFHUB_VALID_URL_REGEX, path):.


python/tensorflowjs/cli.py, line 158 at r3 (raw file):

[

nit: Use tuple, instead of list for cases like this.


python/tensorflowjs/cli.py, line 167 at r3 (raw file):

return 'The path provided is not a directory or .json file: %s' % path

Can we require a .json file (i.e., reject a directory) for the case of input_format == TFJS_LAYERS_MODEL? We want to be consistent with tensorflowjs_converter.


python/tensorflowjs/cli.py, line 207 at r3 (raw file):

return 'The output path already exists: %s' % output_path

In some cases, the user may want to overwrite an existing file (e.g., in the tfjs_layers_model --> keras option). Should that be allowed?


python/tensorflowjs/cli.py, line 220 at r3 (raw file):

'split_weights_by_layer'

It'll be great if these flag names can be refactored to be shared between cli.py and converter.py to reduce future maintenance cost.


python/tensorflowjs/cli.py, line 254 at r3 (raw file):

key': 'g'

Need a comment to describe what 'key' means.


python/tensorflowjs/cli.py, line 385 at r3 (raw file):

c

Upper-case "C" here?


python/tensorflowjs/cli.py, line 385 at r3 (raw file):

Weclome

Typo


python/tensorflowjs/cli.py, line 504 at r3 (raw file):

print('converter command generated:')

IMO, the generated command should always be printed, no matter whether this is a dry run.


python/tensorflowjs/converters/converter.py, line 437 at r3 (raw file):

setup_arguments(arguments=None):

Add a doc string explaining what happens when arguments is provided and what happens when it's not.

Copy link
Contributor Author

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)


README.md, line 60 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
- Regular conversion script: tensorflowjs_converter

State the fact that both come with pip install tensorflowjs.

the pip install is stated in the step 1.


README.md, line 72 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
r own script,

Style nit: respect 80-char line limit whenever possible.

Done.


python/tensorflowjs/init.py, line 24 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
from tensorflowjs import cli

Style: the import lines should be sorted lexicographically.

Done.


python/tensorflowjs/BUILD, line 137 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
"//tensorflowjs:expect_tensorflow_installed",

For consistency, you should add a new dummy dependency called epxect_PyInquirer_installed, following the pattern of expect_tensorflow_installed.

Done.


python/tensorflowjs/cli.py, line 24 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
from PyInquirer import prompt, style_from_dict, Token

Style: importing individual symbols are against Google Python style. Instead, import the entire module. See:
https://github.com/google/styleguide/blob/gh-pages/pyguide.md#224-decision

Done.


python/tensorflowjs/cli.py, line 37 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
# Token.Selected: '',  # default

Should this line be removed?

Done.


python/tensorflowjs/cli.py, line 45 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…

Should converter.py use these constants as well for consistency and future-proofness? If you do that, perhaps these should be refactored into a shared small Python file.

moved the format name and argument string to common.py, and shared between cli and converter


python/tensorflowjs/cli.py, line 49 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
tfjs_keras_model

Should this be tfjs_graph_model?

thanks


python/tensorflowjs/cli.py, line 63 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
1/2'

Why not be consistent with the terminology we've used elsewhere, i.e., '8-bit' and '16-bit'? Introducing new terminology is confusing IMO.

If we are talking about compression, it is easier for user to visualize 1/2 the size, instead of 8bit.


python/tensorflowjs/cli.py, line 92 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
detect_

Some of the logic in the function are not nearly 100% bulletproof. So I think this function should be called "guess_input_format" instead.

I have update the logic, please check again.


python/tensorflowjs/cli.py, line 106 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
detected_input_format = TF_SAVED_MODEL

How reliable is this logic? Remember that keras H5 files are single files. So a user could have a file in a directory with many other files, and some of those other files may happen to be some unrelated protobuf (.pb) file. (Imagine someone's download folder). Is there a more unique signature of the .pb file from a TF SavedModel?

we could look for saved_model.pb, which suppose to be the name for the saved model pb file. We do parse the pb file later for the tags/signature_names, which should further validate the file.


python/tensorflowjs/cli.py, line 117 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
'.hdf5'

The extension name for a HDF5 is often .h5. In fact, it can be any extension name. The HDF5 library doesn't require any specific extensions. I think you should try creating an HDF5 object in order to determine this condition with higher reliability.

Done.


python/tensorflowjs/cli.py, line 122 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
    elif (input_path.endswith('.json') and
          get_tfjs_model_type(input_path) == 'layers-model'):

You can try loading the .json file with Python's json module and check the fields such as generatedBy and convertedBy.

get_tfjs_model_type loads the json file and check the format field.


python/tensorflowjs/cli.py, line 123 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
detected_input_format = TFJS_LAYERS_MODEL

Should there be an else branch that throws an Error saying that it can't automatically guess the type of the input?

the default is value for detected_input_format is None.


python/tensorflowjs/cli.py, line 144 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
v

nit: --> Upper case "V". Same below.

Done.


python/tensorflowjs/cli.py, line 153 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
re.match(TFHUB_VALID_URL_REGEX, path) is None

You can just do if not re.match(TFHUB_VALID_URL_REGEX, path):.

Done.


python/tensorflowjs/cli.py, line 167 at r3 (raw file):

nit: Use tuple, instead of list for cases like this.
this is suggested by Daniel for usability, since the cli will expand path to the file, it is transparent to the user.


python/tensorflowjs/cli.py, line 207 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
return 'The output path already exists: %s' % output_path

In some cases, the user may want to overwrite an existing file (e.g., in the tfjs_layers_model --> keras option). Should that be allowed?

are you suggesting a confirmation to overwrite?


python/tensorflowjs/cli.py, line 220 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
'split_weights_by_layer'

It'll be great if these flag names can be refactored to be shared between cli.py and converter.py to reduce future maintenance cost.

yes, any suggestions? export them from the converter file?


python/tensorflowjs/cli.py, line 254 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
key': 'g'

Need a comment to describe what 'key' means.

Done.


python/tensorflowjs/cli.py, line 385 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
c

Upper-case "C" here?

Done.


python/tensorflowjs/cli.py, line 385 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
Weclome

Typo

Done.


python/tensorflowjs/cli.py, line 504 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
print('converter command generated:')

IMO, the generated command should always be printed, no matter whether this is a dry run.

Done.


python/tensorflowjs/converters/converter.py, line 437 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
setup_arguments(arguments=None):

Add a doc string explaining what happens when arguments is provided and what happens when it's not.

Done.

Copy link
Contributor Author

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)


python/tensorflowjs/cli.py, line 158 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
[

nit: Use tuple, instead of list for cases like this.

Done.

Copy link
Contributor

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)


python/setup.py, line 30 at r4 (raw file):

'tensorflowjs_cli 

Naming question: tensorflowjs_cli is confusing in the sense that tensorflowjs_converter is also a CLI. Should we call it tensorflowjs_wizard instead? Similarly, cli.py should perhaps be called wizard.py.


python/tensorflowjs/BUILD, line 54 at r4 (raw file):

tensorflow

s/tensorflow/PyInquirer/


python/tensorflowjs/BUILD, line 132 at r4 (raw file):

":cli"

deps should be sorted lexicographihcally.


python/tensorflowjs/cli.py, line 207 at r3 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

are you suggesting a confirmation to overwrite?

If the output is a directory (in the case of tfjs_graphs_model, for example), it should just error out.
If the output is a file, yes, I think it should ask the user if they want to overwrite the existing file.


python/tensorflowjs/cli.py, line 220 at r3 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

yes, any suggestions? export them from the converter file?

You can create a new python file, perhaps called cli_common.py, to hold string constants like this that are shared between cli.py (wizard.py?) and converter.py. A new BUILD rule should be created for it too and old BUILD targets should depend on it.


python/tensorflowjs/cli.py, line 508 at r3 (raw file):

file 

nit: "File(s)"


python/tensorflowjs/cli.py, line 501 at r4 (raw file):

print('tensorflowjs_converter %s' % ' '.join(arguments))

You should print an empty line or two after this, so that screen output is visually easier to parse.


python/tensorflowjs/converters/common.py, line 33 at r4 (raw file):

 model formats

style nit: Comments should start with an upper-case letter and end with a period. Same below.

Copy link
Contributor Author

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)


python/setup.py, line 30 at r4 (raw file):

Previously, caisq (Shanqing Cai) wrote…
'tensorflowjs_cli 

Naming question: tensorflowjs_cli is confusing in the sense that tensorflowjs_converter is also a CLI. Should we call it tensorflowjs_wizard instead? Similarly, cli.py should perhaps be called wizard.py.

Done.


python/tensorflowjs/BUILD, line 54 at r4 (raw file):

Previously, caisq (Shanqing Cai) wrote…
tensorflow

s/tensorflow/PyInquirer/

Done.


python/tensorflowjs/BUILD, line 132 at r4 (raw file):

Previously, caisq (Shanqing Cai) wrote…
":cli"

deps should be sorted lexicographihcally.

Done.


python/tensorflowjs/converters/common.py, line 33 at r4 (raw file):

Previously, caisq (Shanqing Cai) wrote…
 model formats

style nit: Comments should start with an upper-case letter and end with a period. Same below.

Done.


python/tensorflowjs/cli.py, line 207 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…

If the output is a directory (in the case of tfjs_graphs_model, for example), it should just error out.
If the output is a file, yes, I think it should ask the user if they want to overwrite the existing file.

Added a confirmation step if the output directory exists.


python/tensorflowjs/cli.py, line 220 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…

You can create a new python file, perhaps called cli_common.py, to hold string constants like this that are shared between cli.py (wizard.py?) and converter.py. A new BUILD rule should be created for it too and old BUILD targets should depend on it.

Done.


python/tensorflowjs/cli.py, line 508 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…
file 

nit: "File(s)"

Done.


python/tensorflowjs/cli.py, line 501 at r4 (raw file):

Previously, caisq (Shanqing Cai) wrote…
print('tensorflowjs_converter %s' % ' '.join(arguments))

You should print an empty line or two after this, so that screen output is visually easier to parse.

Done.

Copy link
Contributor

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)


python/tensorflowjs/cli.py, line 63 at r3 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

If we are talking about compression, it is easier for user to visualize 1/2 the size, instead of 8bit.

I'm not convinced that 1/2 and 1/4 are better terminology. Consider the fact that when you use 16-bit quantization, the size of the model is not exactly 1/2, because

  1. The model is not just weights; there is also topology, which is unaffected by the quantization
  2. Even if we are just talking about the weights, it's not exactly 1/2 either, because the quantization scheme needs to store the min and the range.

So the 1/2 and 1/4 terminology only creates confusion.


python/tensorflowjs/wizard.py, line 133 at r5 (raw file):

what is the TFHub module URL?

Consider printing an example URL here for clarity.


python/tensorflowjs/wizard.py, line 510 at r5 (raw file):

not params['overwrite_output_path']

Should this be param['overwrite_output_path'] is None instead?

Copy link
Contributor Author

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)


python/tensorflowjs/cli.py, line 63 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…

I'm not convinced that 1/2 and 1/4 are better terminology. Consider the fact that when you use 16-bit quantization, the size of the model is not exactly 1/2, because

  1. The model is not just weights; there is also topology, which is unaffected by the quantization
  2. Even if we are just talking about the weights, it's not exactly 1/2 either, because the quantization scheme needs to store the min and the range.

So the 1/2 and 1/4 terminology only creates confusion.

This method is remove, since the choice can have name and value field, here are the labels:
[{
name: 'No compression, no accuracy loss.',
value: 0
}, {
name: '2x compression, medium accuracy loss.',
value: 1
}, {
name: '4x compression, highest accuracy loss.',
value: 2
}]


python/tensorflowjs/wizard.py, line 133 at r5 (raw file):

Previously, caisq (Shanqing Cai) wrote…
what is the TFHub module URL?

Consider printing an example URL here for clarity.

Done.


python/tensorflowjs/wizard.py, line 510 at r5 (raw file):

Previously, caisq (Shanqing Cai) wrote…
not params['overwrite_output_path']

Should this be param['overwrite_output_path'] is None instead?

this field should be set to False or True when the OUTPUT_PATH is set.

Copy link
Contributor

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyu10055 Thanks for addressing my comments so far. It's getting there.

Did you post a screenshot or screencast of the wizard UI somewhere?

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)

Copy link
Contributor

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)


python/tensorflowjs/wizard.py, line 511 at r6 (raw file):

after

Writing nit: after --> by.


python/tensorflowjs/wizard.py, line 512 at r6 (raw file):

os.scandir

I'm pretty sure that os.scandir is Python3 only. To the degree we plan to support python2 going forward, this should be replaced by a utility function.


python/tensorflowjs/wizard.py, line 513 at r6 (raw file):

entry.stat().st_size
  1. In addition to printing individual file sizes, it would be nice to print the total file size as well at the end.

  2. Can you pretty format the output, so that all file names line up at the same column. E.g., see:
    https://stackoverflow.com/questions/10623727/python-spacing-and-aligning-strings

Copy link
Contributor

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyu10055 Another thought (sorry for the fragmented comments. I'll make them more consolidated from here on):

Having two binaries from tensorflowjs is potentially confusing. Can we consolidate tensorflowjs_converter and tensorflowjs_wizard into one? It may still be called tensorflowjs_converter. But when user calls the binary without any flags, it'll just drop into the interactive interface.

WDYT?

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)

Copy link
Contributor Author

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is we cannot sync the wizard into g3, have two would allow us to filter out just the wizard.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)


python/tensorflowjs/wizard.py, line 511 at r6 (raw file):

Previously, caisq (Shanqing Cai) wrote…
after

Writing nit: after --> by.

Done.


python/tensorflowjs/wizard.py, line 512 at r6 (raw file):

Previously, caisq (Shanqing Cai) wrote…
os.scandir

I'm pretty sure that os.scandir is Python3 only. To the degree we plan to support python2 going forward, this should be replaced by a utility function.

use listdir instead


python/tensorflowjs/wizard.py, line 513 at r6 (raw file):

Previously, caisq (Shanqing Cai) wrote…
entry.stat().st_size
  1. In addition to printing individual file sizes, it would be nice to print the total file size as well at the end.

  2. Can you pretty format the output, so that all file names line up at the same column. E.g., see:
    https://stackoverflow.com/questions/10623727/python-spacing-and-aligning-strings

Done.

Copy link
Contributor

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Check with @nsthorat and @smilkov about whether they have any remaining comments.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @caisq and @dsmilkov)

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @pyu10055 ! Thanks for the detailed review @caisq! I left a few comments regarding the outdated info in the readme. LGTM otherwise.

Reviewed 7 of 9 files at r5, 1 of 2 files at r6, 1 of 1 files at r7.
Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)


README.md, line 28 at r7 (raw file):

 existing TensorFlow/Keras installation.

__Note__: *Check that [`tf-nightly-2.0-preview`](https://pypi.org/project/tf-nightly-2.0-preview/#files) is available for your platform.*

since you are already touching README.me , let's remove this sentence which seems outdated.


README.md, line 30 at r7 (raw file):

__Note__: *Check that [`tf-nightly-2.0-preview`](https://pypi.org/project/tf-nightly-2.0-preview/#files) is available for your platform.*

Most of the times, this means that you have to use Python 3.6.8 in your local

also this is outdated, you can use either python2 or python 3 (update the text below accordingly)


README.md, line 73 at r7 (raw file):

There is also dry run mode for the wizard, which will not perform the actual
conversion but only generate the command for tensorflowjs_converter command.

wrap tensorflowjs_converter in quotes


python/tensorflowjs/wizard.py, line 1 at r7 (raw file):

# Copyright 2018 Google LLC

2019

Copy link
Contributor

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - more comments, on the screen cast at https://drive.google.com/file/d/1mKqfZJ-BnRBTdsTplL6fUBDvBOHDKKWz/view

  1. In the "files generated by conversion" table printed at the end, the "size" column is not aligned well when the numbers.
  2. The "size" column should say the unit e.g., "Size (Bytes)"
  3. Consider printing an empty line between the successive questions, so that the screen output is easier to parse visually.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants