-
Notifications
You must be signed in to change notification settings - Fork 145
Interactive cli for python converter #393
base: master
Are you sure you want to change the base?
Conversation
Ping, amazing work! This is incredible. Few high level points:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks,
- will generate a design doc
- since different model requires difference things (saved model needs directory, keras needs a file), I would think let the user choose the format first would be better. (the cli does create different questions according to the input format selected).
- tag/signature list are loaded from the saved model, and shown as options, users just need to select one.
- agree, will add
- will do
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @nsthorat)
Let's talk over there! I think we still should detect the input format :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the workflow of the CLI according to the comments, and a new video is uploaded, https://drive.google.com/open?id=1hWabsvY4d3GdDJxUlBwMiBuawIL1ulfI
PTAL.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @nsthorat)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a first pass. Good work. A concern I have is about the third_party dependency.
Reviewed 2 of 7 files at r1, 1 of 3 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, @nsthorat, and @pyu10055)
python/requirements.txt, line 7 at r2 (raw file):
tensorflow==1.14.0 tensorflow-hub==0.5.0 PyInquirer==1.0.3
this is a bit problematic since the library is not in third_party yet and we want to sync the python converter internally. Let's sync it to third_party before moving forward, so we know it's possible. Otherwise, how about using Prompt-Toolkit (v2) which is already in third party: https://github.com/prompt-toolkit/python-prompt-toolkit/tree/2.0
python/tensorflowjs/cli.py, line 24 at r2 (raw file):
from PyInquirer import prompt from examples import custom_style_3
where does the examples
namespace come from?
python/tensorflowjs/cli.py, line 27 at r2 (raw file):
from tensorflow.python.saved_model.loader_impl import parse_saved_model from tensorflow.core.framework import types_pb2 from tensorflowjs.converters.converter import main as convert
not a big fan of aliasing since it's harder to follow which method is actually called. let's rename main to convert in converter.py. See comment below.
python/tensorflowjs/cli.py, line 42 at r2 (raw file):
def quantization_type(value):
s/value/user_selection_quant/ for readability
python/tensorflowjs/cli.py, line 45 at r2 (raw file):
"""Determine the quantization type based on user selection. Args: value: user selected value.
add returns section
python/tensorflowjs/cli.py, line 58 at r2 (raw file):
def of_values(answers, key, values):
the name hides a bit what this function does. Let's call it value_in_list(value, list)
and do the dict lookup outside to make the code more readable: value_in_list(answers[key], valid_answers)
python/tensorflowjs/cli.py, line 72 at r2 (raw file):
def getTFJSModelType(file):
snake case?
python/tensorflowjs/cli.py, line 78 at r2 (raw file):
def detect_input_format(answers):
instead of storing a temporary variable inside answers[DETECTED_INPUT_FORMAT]
, let's have this method return the detected_input_format
and normalized_input_path
as a result. Then pass those results where you need it. This way you don't need to ignore "special answers" when you generate flags later.
Do the overriding of answers[input_path] = normalized_input_path
outside this method.
Also have input_path
as the only input to this function - easier to understand what this function actually needs.
python/tensorflowjs/cli.py, line 107 at r2 (raw file):
"""Determine question for model's input path. Args: answer: Dict of user's answers to the questions
this method signature is a bit misleading since you do not need the whole dictionary. Pass the minimal info this function needs. In this case input_format
python/tensorflowjs/cli.py, line 119 at r2 (raw file):
def validate_input_path(value, input_format):
s/value/input_path/
python/tensorflowjs/cli.py, line 130 at r2 (raw file):
if input_format == TF_HUB: if re.match(URL_REGEX, value) is None: return 'This is not an valid url for TFHub module: %s' % value
Add more info to the end. "We expect a URL that starts with http(s)://"
python/tensorflowjs/cli.py, line 135 at r2 (raw file):
if input_format in [KERAS_SAVED_MODEL, TF_SAVED_MODEL]: if not os.path.isdir(value): return 'The path provided is not a directory: %s' % value
let's be more flexible and work with a path to ".pb" as well, not just a directory
python/tensorflowjs/cli.py, line 137 at r2 (raw file):
return 'The path provided is not a directory: %s' % value if not any(fname.endswith('.pb') for fname in os.listdir(value)): return 'This is an invalid saved model directory: %s' % value
let's make the error more concrete: Did not find a .pb
file inside the model's directory" (also make sure you remove "saved model" since this is also used for keras.
python/tensorflowjs/cli.py, line 140 at r2 (raw file):
if input_format == TFJS_LAYERS_MODEL: if not os.path.isfile(value): return 'The path provided is not a file: %s' % value
To be consistent with saved models, let's accept both a directory that has a json file inside, or a json file directly.
python/tensorflowjs/cli.py, line 151 at r2 (raw file):
Args: value: input path of the model. input_format: model format string.
missing returns
python/tensorflowjs/cli.py, line 164 at r2 (raw file):
"""generate the tensorflowjs command string for the selected params. Args: params: user selected parameters for the conversion.
missing returns
python/tensorflowjs/cli.py, line 182 at r2 (raw file):
def is_saved_model(answers):
pass in minimal viable info, namely input_format and output_format.
python/tensorflowjs/cli.py, line 342 at r2 (raw file):
'filter': os.path.expanduser, 'validate': lambda value: 'Please enter a valid path' if not value else True
lambda path:
python/tensorflowjs/cli.py, line 454 at r2 (raw file):
for entry in it: print(entry.name, os.path.getsize(os.path.join(params['output_path'], entry.name)))
the entry
which is of type DirEntry
already has a stat
property on it, which has file size, among other info: https://docs.python.org/3/library/os.html#os.DirEntry.stat
python/tensorflowjs/converters/converter.py, line 495 at r2 (raw file):
return parser.parse_args() def main(arguments=None):
let's rename this function to convert, and add another main which calls it
python/tensorflowjs/converters/converter.py, line 561 at r2 (raw file):
elif (input_format == 'tf_hub' and output_format == 'tfjs_graph_model'): print(FLAGS)
remove print statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @nsthorat, and @pyu10055)
python/tensorflowjs/cli.py, line 29 at r2 (raw file):
from tensorflowjs.converters.converter import main as convert # regex for recognizing valid url for TFHub module. URL_REGEX = re.compile(
how about TFHUB_VALID_URL_REGEX
@pyu10055 It just occurred to me, but you might want to verify that PyInquirer works in notebook environments such as Jupyter Notebook and CoLab. From our experience, we know many of our users do their Python model training and subsequent conversion in notebooks. It would not be ideal if the interactive CLI doesn't work in those environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decision made offline:
- will not sync the interactive cli to g3
- will not try to support notebook or colab to run interactive cli directly, will add a dry-run mode to generate the command line.
- continue use pyInquirer library.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)
python/requirements.txt, line 7 at r2 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
this is a bit problematic since the library is not in third_party yet and we want to sync the python converter internally. Let's sync it to third_party before moving forward, so we know it's possible. Otherwise, how about using Prompt-Toolkit (v2) which is already in third party: https://github.com/prompt-toolkit/python-prompt-toolkit/tree/2.0
resolved offline see comment above.
python/tensorflowjs/cli.py, line 24 at r2 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
where does the
examples
namespace come from?
removed
python/tensorflowjs/cli.py, line 58 at r2 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
the name hides a bit what this function does. Let's call it
value_in_list(value, list)
and do the dict lookup outside to make the code more readable:value_in_list(answers[key], valid_answers)
thanks for the suggestion, one thing that this method also do is to check if key is a valid key, otherwise call need to do the checking, which could be quite some places.
python/tensorflowjs/cli.py, line 182 at r2 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
pass in minimal viable info, namely input_format and output_format.
done
python/tensorflowjs/cli.py, line 454 at r2 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
the
entry
which is of typeDirEntry
already has astat
property on it, which has file size, among other info: https://docs.python.org/3/library/os.html#os.DirEntry.stat
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)
README.md, line 60 at r3 (raw file):
- Regular conversion script: tensorflowjs_converter
State the fact that both come with pip install tensorflowjs
.
README.md, line 72 at r3 (raw file):
r own script,
Style nit: respect 80-char line limit whenever possible.
python/tensorflowjs/init.py, line 24 at r3 (raw file):
from tensorflowjs import cli
Style: the import lines should be sorted lexicographically.
python/tensorflowjs/BUILD, line 137 at r3 (raw file):
"//tensorflowjs:expect_tensorflow_installed",
For consistency, you should add a new dummy dependency called epxect_PyInquirer_installed
, following the pattern of expect_tensorflow_installed
.
python/tensorflowjs/cli.py, line 24 at r3 (raw file):
from PyInquirer import prompt, style_from_dict, Token
Style: importing individual symbols are against Google Python style. Instead, import the entire module. See:
https://github.com/google/styleguide/blob/gh-pages/pyguide.md#224-decision
python/tensorflowjs/cli.py, line 37 at r3 (raw file):
# Token.Selected: '', # default
Should this line be removed?
python/tensorflowjs/cli.py, line 45 at r3 (raw file):
}) KERAS_SAVED_MODEL = 'keras_saved_model'
Should converter.py
use these constants as well for consistency and future-proofness? If you do that, perhaps these should be refactored into a shared small Python file.
python/tensorflowjs/cli.py, line 49 at r3 (raw file):
tfjs_keras_model
Should this be tfjs_graph_model
?
python/tensorflowjs/cli.py, line 63 at r3 (raw file):
1/2'
Why not be consistent with the terminology we've used elsewhere, i.e., '8-bit' and '16-bit'? Introducing new terminology is confusing IMO.
python/tensorflowjs/cli.py, line 92 at r3 (raw file):
detect_
Some of the logic in the function are not nearly 100% bulletproof. So I think this function should be called "guess_input_format" instead.
python/tensorflowjs/cli.py, line 106 at r3 (raw file):
detected_input_format = TF_SAVED_MODEL
How reliable is this logic? Remember that keras H5 files are single files. So a user could have a file in a directory with many other files, and some of those other files may happen to be some unrelated protobuf (.pb) file. (Imagine someone's download folder). Is there a more unique signature of the .pb file from a TF SavedModel?
python/tensorflowjs/cli.py, line 117 at r3 (raw file):
'.hdf5'
The extension name for a HDF5 is often .h5. In fact, it can be any extension name. The HDF5 library doesn't require any specific extensions. I think you should try creating an HDF5 object in order to determine this condition with higher reliability.
python/tensorflowjs/cli.py, line 122 at r3 (raw file):
elif (input_path.endswith('.json') and get_tfjs_model_type(input_path) == 'layers-model'):
You can try loading the .json file with Python's json module and check the fields such as generatedBy and convertedBy.
python/tensorflowjs/cli.py, line 123 at r3 (raw file):
detected_input_format = TFJS_LAYERS_MODEL
Should there be an else
branch that throws an Error saying that it can't automatically guess the type of the input?
python/tensorflowjs/cli.py, line 144 at r3 (raw file):
v
nit: --> Upper case "V". Same below.
python/tensorflowjs/cli.py, line 153 at r3 (raw file):
re.match(TFHUB_VALID_URL_REGEX, path) is None
You can just do if not re.match(TFHUB_VALID_URL_REGEX, path):
.
python/tensorflowjs/cli.py, line 158 at r3 (raw file):
[
nit: Use tuple, instead of list for cases like this.
python/tensorflowjs/cli.py, line 167 at r3 (raw file):
return 'The path provided is not a directory or .json file: %s' % path
Can we require a .json file (i.e., reject a directory) for the case of input_format == TFJS_LAYERS_MODEL
? We want to be consistent with tensorflowjs_converter.
python/tensorflowjs/cli.py, line 207 at r3 (raw file):
return 'The output path already exists: %s' % output_path
In some cases, the user may want to overwrite an existing file (e.g., in the tfjs_layers_model --> keras option). Should that be allowed?
python/tensorflowjs/cli.py, line 220 at r3 (raw file):
'split_weights_by_layer'
It'll be great if these flag names can be refactored to be shared between cli.py and converter.py to reduce future maintenance cost.
python/tensorflowjs/cli.py, line 254 at r3 (raw file):
key': 'g'
Need a comment to describe what 'key' means.
python/tensorflowjs/cli.py, line 385 at r3 (raw file):
c
Upper-case "C" here?
python/tensorflowjs/cli.py, line 385 at r3 (raw file):
Weclome
Typo
python/tensorflowjs/cli.py, line 504 at r3 (raw file):
print('converter command generated:')
IMO, the generated command should always be printed, no matter whether this is a dry run.
python/tensorflowjs/converters/converter.py, line 437 at r3 (raw file):
setup_arguments(arguments=None):
Add a doc string explaining what happens when arguments
is provided and what happens when it's not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)
README.md, line 60 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
- Regular conversion script: tensorflowjs_converter
State the fact that both come with
pip install tensorflowjs
.
the pip install is stated in the step 1.
README.md, line 72 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
r own script,
Style nit: respect 80-char line limit whenever possible.
Done.
python/tensorflowjs/init.py, line 24 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
from tensorflowjs import cli
Style: the import lines should be sorted lexicographically.
Done.
python/tensorflowjs/BUILD, line 137 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
"//tensorflowjs:expect_tensorflow_installed",
For consistency, you should add a new dummy dependency called
epxect_PyInquirer_installed
, following the pattern ofexpect_tensorflow_installed
.
Done.
python/tensorflowjs/cli.py, line 24 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
from PyInquirer import prompt, style_from_dict, Token
Style: importing individual symbols are against Google Python style. Instead, import the entire module. See:
https://github.com/google/styleguide/blob/gh-pages/pyguide.md#224-decision
Done.
python/tensorflowjs/cli.py, line 37 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
# Token.Selected: '', # default
Should this line be removed?
Done.
python/tensorflowjs/cli.py, line 45 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
Should
converter.py
use these constants as well for consistency and future-proofness? If you do that, perhaps these should be refactored into a shared small Python file.
moved the format name and argument string to common.py, and shared between cli and converter
python/tensorflowjs/cli.py, line 49 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
tfjs_keras_model
Should this be
tfjs_graph_model
?
thanks
python/tensorflowjs/cli.py, line 63 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
1/2'
Why not be consistent with the terminology we've used elsewhere, i.e., '8-bit' and '16-bit'? Introducing new terminology is confusing IMO.
If we are talking about compression, it is easier for user to visualize 1/2 the size, instead of 8bit.
python/tensorflowjs/cli.py, line 92 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
detect_
Some of the logic in the function are not nearly 100% bulletproof. So I think this function should be called "guess_input_format" instead.
I have update the logic, please check again.
python/tensorflowjs/cli.py, line 106 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
detected_input_format = TF_SAVED_MODEL
How reliable is this logic? Remember that keras H5 files are single files. So a user could have a file in a directory with many other files, and some of those other files may happen to be some unrelated protobuf (.pb) file. (Imagine someone's download folder). Is there a more unique signature of the .pb file from a TF SavedModel?
we could look for saved_model.pb, which suppose to be the name for the saved model pb file. We do parse the pb file later for the tags/signature_names, which should further validate the file.
python/tensorflowjs/cli.py, line 117 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
'.hdf5'
The extension name for a HDF5 is often .h5. In fact, it can be any extension name. The HDF5 library doesn't require any specific extensions. I think you should try creating an HDF5 object in order to determine this condition with higher reliability.
Done.
python/tensorflowjs/cli.py, line 122 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
elif (input_path.endswith('.json') and get_tfjs_model_type(input_path) == 'layers-model'):
You can try loading the .json file with Python's json module and check the fields such as generatedBy and convertedBy.
get_tfjs_model_type loads the json file and check the format field.
python/tensorflowjs/cli.py, line 123 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
detected_input_format = TFJS_LAYERS_MODEL
Should there be an
else
branch that throws an Error saying that it can't automatically guess the type of the input?
the default is value for detected_input_format is None.
python/tensorflowjs/cli.py, line 144 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
v
nit: --> Upper case "V". Same below.
Done.
python/tensorflowjs/cli.py, line 153 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
re.match(TFHUB_VALID_URL_REGEX, path) is None
You can just do
if not re.match(TFHUB_VALID_URL_REGEX, path):
.
Done.
python/tensorflowjs/cli.py, line 167 at r3 (raw file):
nit: Use tuple, instead of list for cases like this.
this is suggested by Daniel for usability, since the cli will expand path to the file, it is transparent to the user.
python/tensorflowjs/cli.py, line 207 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
return 'The output path already exists: %s' % output_path
In some cases, the user may want to overwrite an existing file (e.g., in the tfjs_layers_model --> keras option). Should that be allowed?
are you suggesting a confirmation to overwrite?
python/tensorflowjs/cli.py, line 220 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
'split_weights_by_layer'
It'll be great if these flag names can be refactored to be shared between cli.py and converter.py to reduce future maintenance cost.
yes, any suggestions? export them from the converter file?
python/tensorflowjs/cli.py, line 254 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
key': 'g'
Need a comment to describe what 'key' means.
Done.
python/tensorflowjs/cli.py, line 385 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
c
Upper-case "C" here?
Done.
python/tensorflowjs/cli.py, line 385 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
Weclome
Typo
Done.
python/tensorflowjs/cli.py, line 504 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
print('converter command generated:')
IMO, the generated command should always be printed, no matter whether this is a dry run.
Done.
python/tensorflowjs/converters/converter.py, line 437 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
setup_arguments(arguments=None):
Add a doc string explaining what happens when
arguments
is provided and what happens when it's not.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)
python/tensorflowjs/cli.py, line 158 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
[
nit: Use tuple, instead of list for cases like this.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)
python/setup.py, line 30 at r4 (raw file):
'tensorflowjs_cli
Naming question: tensorflowjs_cli
is confusing in the sense that tensorflowjs_converter
is also a CLI. Should we call it tensorflowjs_wizard
instead? Similarly, cli.py
should perhaps be called wizard.py
.
python/tensorflowjs/BUILD, line 54 at r4 (raw file):
tensorflow
s/tensorflow/PyInquirer/
python/tensorflowjs/BUILD, line 132 at r4 (raw file):
":cli"
deps should be sorted lexicographihcally.
python/tensorflowjs/cli.py, line 207 at r3 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
are you suggesting a confirmation to overwrite?
If the output is a directory (in the case of tfjs_graphs_model, for example), it should just error out.
If the output is a file, yes, I think it should ask the user if they want to overwrite the existing file.
python/tensorflowjs/cli.py, line 220 at r3 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
yes, any suggestions? export them from the converter file?
You can create a new python file, perhaps called cli_common.py, to hold string constants like this that are shared between cli.py (wizard.py?) and converter.py. A new BUILD rule should be created for it too and old BUILD targets should depend on it.
python/tensorflowjs/cli.py, line 508 at r3 (raw file):
file
nit: "File(s)"
python/tensorflowjs/cli.py, line 501 at r4 (raw file):
print('tensorflowjs_converter %s' % ' '.join(arguments))
You should print an empty line or two after this, so that screen output is visually easier to parse.
python/tensorflowjs/converters/common.py, line 33 at r4 (raw file):
model formats
style nit: Comments should start with an upper-case letter and end with a period. Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)
python/setup.py, line 30 at r4 (raw file):
Previously, caisq (Shanqing Cai) wrote…
'tensorflowjs_cli
Naming question:
tensorflowjs_cli
is confusing in the sense thattensorflowjs_converter
is also a CLI. Should we call ittensorflowjs_wizard
instead? Similarly,cli.py
should perhaps be calledwizard.py
.
Done.
python/tensorflowjs/BUILD, line 54 at r4 (raw file):
Previously, caisq (Shanqing Cai) wrote…
tensorflow
s/tensorflow/PyInquirer/
Done.
python/tensorflowjs/BUILD, line 132 at r4 (raw file):
Previously, caisq (Shanqing Cai) wrote…
":cli"
deps should be sorted lexicographihcally.
Done.
python/tensorflowjs/converters/common.py, line 33 at r4 (raw file):
Previously, caisq (Shanqing Cai) wrote…
model formats
style nit: Comments should start with an upper-case letter and end with a period. Same below.
Done.
python/tensorflowjs/cli.py, line 207 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
If the output is a directory (in the case of tfjs_graphs_model, for example), it should just error out.
If the output is a file, yes, I think it should ask the user if they want to overwrite the existing file.
Added a confirmation step if the output directory exists.
python/tensorflowjs/cli.py, line 220 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
You can create a new python file, perhaps called cli_common.py, to hold string constants like this that are shared between cli.py (wizard.py?) and converter.py. A new BUILD rule should be created for it too and old BUILD targets should depend on it.
Done.
python/tensorflowjs/cli.py, line 508 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
file
nit: "File(s)"
Done.
python/tensorflowjs/cli.py, line 501 at r4 (raw file):
Previously, caisq (Shanqing Cai) wrote…
print('tensorflowjs_converter %s' % ' '.join(arguments))
You should print an empty line or two after this, so that screen output is visually easier to parse.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)
python/tensorflowjs/cli.py, line 63 at r3 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
If we are talking about compression, it is easier for user to visualize 1/2 the size, instead of 8bit.
I'm not convinced that 1/2 and 1/4 are better terminology. Consider the fact that when you use 16-bit quantization, the size of the model is not exactly 1/2, because
- The model is not just weights; there is also topology, which is unaffected by the quantization
- Even if we are just talking about the weights, it's not exactly 1/2 either, because the quantization scheme needs to store the min and the range.
So the 1/2 and 1/4 terminology only creates confusion.
python/tensorflowjs/wizard.py, line 133 at r5 (raw file):
what is the TFHub module URL?
Consider printing an example URL here for clarity.
python/tensorflowjs/wizard.py, line 510 at r5 (raw file):
not params['overwrite_output_path']
Should this be param['overwrite_output_path'] is None
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)
python/tensorflowjs/cli.py, line 63 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
I'm not convinced that 1/2 and 1/4 are better terminology. Consider the fact that when you use 16-bit quantization, the size of the model is not exactly 1/2, because
- The model is not just weights; there is also topology, which is unaffected by the quantization
- Even if we are just talking about the weights, it's not exactly 1/2 either, because the quantization scheme needs to store the min and the range.
So the 1/2 and 1/4 terminology only creates confusion.
This method is remove, since the choice can have name and value field, here are the labels:
[{
name: 'No compression, no accuracy loss.',
value: 0
}, {
name: '2x compression, medium accuracy loss.',
value: 1
}, {
name: '4x compression, highest accuracy loss.',
value: 2
}]
python/tensorflowjs/wizard.py, line 133 at r5 (raw file):
Previously, caisq (Shanqing Cai) wrote…
what is the TFHub module URL?
Consider printing an example URL here for clarity.
Done.
python/tensorflowjs/wizard.py, line 510 at r5 (raw file):
Previously, caisq (Shanqing Cai) wrote…
not params['overwrite_output_path']
Should this be
param['overwrite_output_path'] is None
instead?
this field should be set to False or True when the OUTPUT_PATH is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pyu10055 Thanks for addressing my comments so far. It's getting there.
Did you post a screenshot or screencast of the wizard UI somewhere?
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)
python/tensorflowjs/wizard.py, line 511 at r6 (raw file):
after
Writing nit: after --> by.
python/tensorflowjs/wizard.py, line 512 at r6 (raw file):
os.scandir
I'm pretty sure that os.scandir
is Python3 only. To the degree we plan to support python2 going forward, this should be replaced by a utility function.
python/tensorflowjs/wizard.py, line 513 at r6 (raw file):
entry.stat().st_size
-
In addition to printing individual file sizes, it would be nice to print the total file size as well at the end.
-
Can you pretty format the output, so that all file names line up at the same column. E.g., see:
https://stackoverflow.com/questions/10623727/python-spacing-and-aligning-strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pyu10055 Another thought (sorry for the fragmented comments. I'll make them more consolidated from here on):
Having two binaries from tensorflowjs is potentially confusing. Can we consolidate tensorflowjs_converter
and tensorflowjs_wizard
into one? It may still be called tensorflowjs_converter
. But when user calls the binary without any flags, it'll just drop into the interactive interface.
WDYT?
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is we cannot sync the wizard into g3, have two would allow us to filter out just the wizard.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @dsmilkov)
python/tensorflowjs/wizard.py, line 511 at r6 (raw file):
Previously, caisq (Shanqing Cai) wrote…
after
Writing nit: after --> by.
Done.
python/tensorflowjs/wizard.py, line 512 at r6 (raw file):
Previously, caisq (Shanqing Cai) wrote…
os.scandir
I'm pretty sure that
os.scandir
is Python3 only. To the degree we plan to support python2 going forward, this should be replaced by a utility function.
use listdir instead
python/tensorflowjs/wizard.py, line 513 at r6 (raw file):
Previously, caisq (Shanqing Cai) wrote…
entry.stat().st_size
In addition to printing individual file sizes, it would be nice to print the total file size as well at the end.
Can you pretty format the output, so that all file names line up at the same column. E.g., see:
https://stackoverflow.com/questions/10623727/python-spacing-and-aligning-strings
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check with @nsthorat and @smilkov about whether they have any remaining comments.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @caisq and @dsmilkov)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @pyu10055 ! Thanks for the detailed review @caisq! I left a few comments regarding the outdated info in the readme. LGTM otherwise.
Reviewed 7 of 9 files at r5, 1 of 2 files at r6, 1 of 1 files at r7.
Reviewable status:complete! 2 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)
README.md, line 28 at r7 (raw file):
existing TensorFlow/Keras installation. __Note__: *Check that [`tf-nightly-2.0-preview`](https://pypi.org/project/tf-nightly-2.0-preview/#files) is available for your platform.*
since you are already touching README.me , let's remove this sentence which seems outdated.
README.md, line 30 at r7 (raw file):
__Note__: *Check that [`tf-nightly-2.0-preview`](https://pypi.org/project/tf-nightly-2.0-preview/#files) is available for your platform.* Most of the times, this means that you have to use Python 3.6.8 in your local
also this is outdated, you can use either python2 or python 3 (update the text below accordingly)
README.md, line 73 at r7 (raw file):
There is also dry run mode for the wizard, which will not perform the actual conversion but only generate the command for tensorflowjs_converter command.
wrap tensorflowjs_converter in quotes
python/tensorflowjs/wizard.py, line 1 at r7 (raw file):
# Copyright 2018 Google LLC
2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - more comments, on the screen cast at https://drive.google.com/file/d/1mKqfZJ-BnRBTdsTplL6fUBDvBOHDKKWz/view
- In the "files generated by conversion" table printed at the end, the "size" column is not aligned well when the numbers.
- The "size" column should say the unit e.g., "Size (Bytes)"
- Consider printing an empty line between the successive questions, so that the screen output is easier to parse visually.
Reviewable status:
complete! 2 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @pyu10055)
It is a command tool that guides the user through the converter options.
Here is an updated video for the cli.
This change is