-
Notifications
You must be signed in to change notification settings - Fork 1.9k
TensorFlowMapper transform for scoring Tensorflow models in ML.NET #704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update to latest dotnet/master
Latest dotnet/master
merge with latest master
@@ -56,6 +57,11 @@ | |||
<AutoGen>True</AutoGen> | |||
<DependentUpon>Resources.resx</DependentUpon> | |||
</Compile> | |||
<Compile Update="TensorFlow\TensorGeneric.cs"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we move tensorflow to separate project + separate nuget package? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think everything that depends on TF (including the TensorFlowTransform) should be in a separate project + package. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd like to leave that part to me you can. I can factor it out when I add the TF binaries. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. lets address this as a separate follow up PR.
(marking as Pending for now)
In reply to: 212033224 [](ancestors = 212033224)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. we will address this in a separate follow up PR.
In reply to: 211834354 [](ancestors = 211834354)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge with latest dotnet/master
…hinelearning into agoswami/tensorflow
return new TensorValueGetter<T>(input, colIndex); | ||
} | ||
|
||
private ITensorValueGetter CreateTensorValueGetterVec(IRow input, TFDataType tfType, bool isVector, int colIndex, TFShape tfShape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vec [](start = 62, length = 3)
We can get rid of this suffix, since there is no other CreateTensorValueGetter method. #Resolved
namespace Microsoft.ML.Transforms.TensorFlow | ||
{ | ||
|
||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 0, length = 1)
Convert tabs to spaces. #Resolved
|
||
namespace Microsoft.ML.Transforms.TensorFlow | ||
{ | ||
internal static partial class NativeBinding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 0, length = 1)
Convert tabs to spaces. #Resolved
values = new T[OutputColType.VectorSize]; | ||
|
||
TensorflowUtils.FetchData<T>(tensors[0].Data, values); | ||
dst = new VBuffer<T>(values.Length, values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new VBuffer [](start = 26, length = 14)
Pass dst.Indices to the new VBuffer as well. #Resolved
@dotnet-bot Test OSX10.13 Release #Resolved |
@dotnet-bot Test OSX10.13 Release #Resolved |
handle.Free(); | ||
} | ||
|
||
internal static bool IsTypeSupported(TFDataType tfoutput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsTypeSupported [](start = 29, length = 15)
These are for input types, we should decide whether we'd like to support other types as well. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Yael.. .. Should we track this as a separate task in the GitHub board we are using ?
In reply to: 212126644 [](ancestors = 212126644)
You don't actually need to push a change to trigger a build. See @dotnet-bot help. #Resolved |
Instead of checking in 25MBs of test model files, can we instead put those in a NuGet package, and pull them from myget.org or something? I don't think we should check in large files into the repo. #Resolved |
|
||
namespace Microsoft.ML.Transforms | ||
{ | ||
public static class TensorflowTransform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to hide "TensorFlow" from the public API? I thought the thinking was to hide the implementation details from the user. Is that no longer a goal? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I was thinking of pre-trained featurizers.
In reply to: 212470775 [](ancestors = 212470775)
…hinelearning into agoswami/tensorflow
public sealed class Arguments : TransformInputBase | ||
{ | ||
|
||
[Argument(ArgumentType.Required, HelpText = "This is the frozen protobuf model file. Please see https://www.tensorflow.org/mobile/prepare_models for more detail(s).", ShortName = "ModelDir", SortOrder = 0)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(s) [](start = 172, length = 3)
this is not needed. for details
is what I'd suggest #Resolved
} | ||
|
||
[TlcModule.EntryPoint(Name = "Transforms.TensorFlowScorer", Desc = Summary, UserName = UserName, ShortName = ShortName)] | ||
public static CommonOutputs.TransformOutput Convert(IHostEnvironment env, Arguments input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert [](start = 52, length = 7)
Surely not Convert
? #Resolved
@@ -0,0 +1,12 @@ | |||
<Project Sdk="Microsoft.NET.Sdk" DefaultTargets="Pack"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericstj - We should also have a ".symbols" pkgproj file. That way a symbols package gets produced and is uploaded to the symbols server for the managed assemblies in this package. See the other folders for an example. #Resolved
</PropertyGroup> | ||
|
||
<ItemGroup> | ||
<ProjectRefernce Include="..\Microsoft.ML.TensorFlow.Redist\Microsoft.ML.TensorFlow.Redist.pkgproj" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a type-o? ProjectRefernce #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh, good catch #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't define ExtractDirectory on the item and instead had a full path as identity. This worked on linux/osx since it prepended a ""/ to the path, which was tolerated by the file system (an extra leading slash). On Windows this doesn't work or course. Fix by appending to the item that doesn't assume files came from an archive.
Linux tests failed with
|
The transform currently accepts the <a href="https://www.tensorflow.org/mobile/prepare_models">frozen TensorFlow model</a> file as input. | ||
</item> | ||
<item>The transform supports scoring only one example at a time.</item> | ||
<item>The name of input column(s) should match the name of input(s) in Tensorflow model.</item> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input columns have to match the names of the inputs in the TF model, is this parameter needed just to identify which subset of columns should be used? Would a user need to rename the columns before adding this transform? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically to map columns in IDataView to inputs in TF model. If the names are kept same we wont have to specify the mapping. The other possibility would be to have overloaded constructor where we can define which column in IDataView maps to which input to TF model using a dictionary.
Right now, yes column needs to be renamed before using this transform.
In reply to: 213780817 [](ancestors = 213780817)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon success, the transform will introduce a new column in <see cref="IDataView"/> based on the name of the output column specified. | ||
</item> | ||
</list> | ||
</remarks> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an explanation of the type of input that is expected? Maybe it is just to clarify that it has to be whatever format the TF model expects, or more detail regarding how images would need to be loaded through a different set of transforms. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but just an explanation of what should be provided. If I'm new to image classification, how do I find out that I need to transform the images to be in the same format that the pretrained TF model expects? Maybe this is not the right place for this though. #Resolved
var shape = tfShapes[i].ToIntArray().Skip(tfShapes[i][0] == -1 ? BatchSize : 0); | ||
if (type.AsVector.DimCount == 1) | ||
{ | ||
int valCount = shape.Aggregate((x, y) => x * y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if data is 1d we will verify with the product of dimensions in the model. But if data passed in is multi-dimensional then we verify if each of the individual dimensions match. Is that the intent of this change..
do you think it might be useful to also display the shapes that mismatch ? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #696, #748 #714
This PR creates a new transform 'TensorFlowMapper' for scoring Tensorflow models in ML.NET.