Skip to content

TensorFlowMapper transform for scoring Tensorflow models in ML.NET #704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
Aug 30, 2018
Merged

TensorFlowMapper transform for scoring Tensorflow models in ML.NET #704

merged 61 commits into from
Aug 30, 2018

Conversation

abgoswam
Copy link
Member

@abgoswam abgoswam commented Aug 21, 2018

Fixes #696, #748 #714

This PR creates a new transform 'TensorFlowMapper' for scoring Tensorflow models in ML.NET.

@@ -56,6 +57,11 @@
<AutoGen>True</AutoGen>
<DependentUpon>Resources.resx</DependentUpon>
</Compile>
<Compile Update="TensorFlow\TensorGeneric.cs">
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move tensorflow to separate project + separate nuget package? #Resolved

Copy link
Member

@ericstj ericstj Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think everything that depends on TF (including the TensorFlowTransform) should be in a separate project + package. #Resolved

Copy link
Member

@ericstj ericstj Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like to leave that part to me you can. I can factor it out when I add the TF binaries. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. lets address this as a separate follow up PR.

(marking as Pending for now)


In reply to: 212033224 [](ancestors = 212033224)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eric will address this as a separate follow up PR


In reply to: 212001912 [](ancestors = 212001912)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. we will address this in a separate follow up PR.


In reply to: 211834354 [](ancestors = 211834354)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now done.


In reply to: 212037611 [](ancestors = 212037611,211834354)

return new TensorValueGetter<T>(input, colIndex);
}

private ITensorValueGetter CreateTensorValueGetterVec(IRow input, TFDataType tfType, bool isVector, int colIndex, TFShape tfShape)
Copy link

@yaeldekel yaeldekel Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vec [](start = 62, length = 3)

We can get rid of this suffix, since there is no other CreateTensorValueGetter method. #Resolved

namespace Microsoft.ML.Transforms.TensorFlow
{

/// <summary>
Copy link

@yaeldekel yaeldekel Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 0, length = 1)

Convert tabs to spaces. #Resolved


namespace Microsoft.ML.Transforms.TensorFlow
{
internal static partial class NativeBinding
Copy link

@yaeldekel yaeldekel Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 0, length = 1)

Convert tabs to spaces. #Resolved

values = new T[OutputColType.VectorSize];

TensorflowUtils.FetchData<T>(tensors[0].Data, values);
dst = new VBuffer<T>(values.Length, values);
Copy link

@yaeldekel yaeldekel Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new VBuffer [](start = 26, length = 14)

Pass dst.Indices to the new VBuffer as well. #Resolved

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Aug 22, 2018

@dotnet-bot Test OSX10.13 Release #Resolved

@abgoswam
Copy link
Member Author

abgoswam commented Aug 22, 2018

@dotnet-bot Test OSX10.13 Release #Resolved

handle.Free();
}

internal static bool IsTypeSupported(TFDataType tfoutput)
Copy link

@yaeldekel yaeldekel Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsTypeSupported [](start = 29, length = 15)

These are for input types, we should decide whether we'd like to support other types as well. #Resolved

Copy link
Member Author

@abgoswam abgoswam Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yael.. .. Should we track this as a separate task in the GitHub board we are using ?


In reply to: 212126644 [](ancestors = 212126644)

@ericstj
Copy link
Member

ericstj commented Aug 23, 2018

You don't actually need to push a change to trigger a build. See @dotnet-bot help. #Resolved

@eerhardt
Copy link
Member

eerhardt commented Aug 23, 2018

Instead of checking in 25MBs of test model files, can we instead put those in a NuGet package, and pull them from myget.org or something? I don't think we should check in large files into the repo. #Resolved


namespace Microsoft.ML.Transforms
{
public static class TensorflowTransform
Copy link
Member

@eerhardt eerhardt Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to hide "TensorFlow" from the public API? I thought the thinking was to hide the implementation details from the user. Is that no longer a goal? #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I was thinking of pre-trained featurizers.


In reply to: 212470775 [](ancestors = 212470775)

public sealed class Arguments : TransformInputBase
{

[Argument(ArgumentType.Required, HelpText = "This is the frozen protobuf model file. Please see https://www.tensorflow.org/mobile/prepare_models for more detail(s).", ShortName = "ModelDir", SortOrder = 0)]
Copy link
Contributor

@Zruty0 Zruty0 Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(s) [](start = 172, length = 3)

this is not needed. for details is what I'd suggest #Resolved

}

[TlcModule.EntryPoint(Name = "Transforms.TensorFlowScorer", Desc = Summary, UserName = UserName, ShortName = ShortName)]
public static CommonOutputs.TransformOutput Convert(IHostEnvironment env, Arguments input)
Copy link
Contributor

@Zruty0 Zruty0 Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert [](start = 52, length = 7)

Surely not Convert ? #Resolved

@@ -0,0 +1,12 @@
<Project Sdk="Microsoft.NET.Sdk" DefaultTargets="Pack">
Copy link
Member

@eerhardt eerhardt Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericstj - We should also have a ".symbols" pkgproj file. That way a symbols package gets produced and is uploaded to the symbols server for the managed assemblies in this package. See the other folders for an example. #Resolved

</PropertyGroup>

<ItemGroup>
<ProjectRefernce Include="..\Microsoft.ML.TensorFlow.Redist\Microsoft.ML.TensorFlow.Redist.pkgproj" />
Copy link
Member

@eerhardt eerhardt Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a type-o? ProjectRefernce #Resolved

Copy link
Member

@ericstj ericstj Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, good catch #Resolved

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

We didn't define ExtractDirectory on the item and instead had a full path as identity.

This worked on linux/osx since it prepended a ""/ to the path, which was tolerated by the file system (an extra leading slash).

On Windows this doesn't work or course.

Fix by appending to the item that doesn't assume files came from an archive.
@ericstj
Copy link
Member

ericstj commented Aug 29, 2018

Linux tests failed with

Failed   Microsoft.ML.Runtime.RunTests.TestEntryPoints.EntryPointPoissonRegression
2018-08-29T17:03:30.2425699Z Error Message:
2018-08-29T17:03:30.2438651Z  System.FormatException : Stream reading encountered exception
2018-08-29T17:03:30.2453212Z ---- System.IO.FileNotFoundException : Could not find file '/__w/19/s/test/data/external/winequality-white.csv'.
2018-08-29T17:03:30.2467248Z Stack Trace:
2018-08-29T17:03:30.2480707Z    at Microsoft.ML.Runtime.Data.TextLoader.Cursor.LineReader.GetBatch() in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs:line 475
2018-08-29T17:03:30.2504317Z    at Microsoft.ML.Runtime.Data.TextLoader.Cursor.GetSomeLines(IMultiStreamSource source, Int32 count, List`1& lines) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs:line 228
2018-08-29T17:03:30.2518542Z    at Microsoft.ML.Runtime.Data.TextLoader.Bindings..ctor(TextLoader parent, Column[] cols, IMultiStreamSource headerFile, IMultiStreamSource dataSample) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 536
2018-08-29T17:03:30.2532894Z    at Microsoft.ML.Runtime.Data.TextLoader..ctor(IHostEnvironment env, Arguments args, IMultiStreamSource dataSample) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1073
2018-08-29T17:03:30.2546955Z    at Microsoft.ML.Runtime.Data.TextLoader.Create(IHostEnvironment env, Arguments args, IMultiStreamSource files) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1260
2018-08-29T17:03:30.2561108Z ----- Inner Stack Trace -----
2018-08-29T17:03:30.2575935Z    at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
2018-08-29T17:03:30.2589645Z    at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
2018-08-29T17:03:30.2603216Z    at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
2018-08-29T17:03:30.2616853Z    at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
2018-08-29T17:03:30.2630306Z    at Microsoft.ML.Runtime.SimpleFileHandle.OpenReadStream() in /__w/19/s/src/Microsoft.ML.Core/Data/IFileHandle.cs:line 197
2018-08-29T17:03:30.2645127Z    at Microsoft.ML.Runtime.Data.FileHandleSource.Open(Int32 index) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/MultiFileSource.cs:line 94
2018-08-29T17:03:30.2660036Z    at Microsoft.ML.Runtime.Data.FileHandleSource.OpenTextReader(Int32 index) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/MultiFileSource.cs:line 99
2018-08-29T17:03:30.2673648Z    at Microsoft.ML.Runtime.Data.TextLoader.Cursor.LineReader.ThreadProc() in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs:line 491
2018-08-29T17:03:30.2687123Z Standard Output Messages:
2018-08-29T17:03:30.2700414Z  Test EntryPointPoissonRegression: aborted: passed
2018-08-29T17:03:30.2706900Z 
``` #Resolved

The transform currently accepts the <a href="https://www.tensorflow.org/mobile/prepare_models">frozen TensorFlow model</a> file as input.
</item>
<item>The transform supports scoring only one example at a time.</item>
<item>The name of input column(s) should match the name of input(s) in Tensorflow model.</item>
Copy link
Contributor

@GalOshri GalOshri Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input columns have to match the names of the inputs in the TF model, is this parameter needed just to identify which subset of columns should be used? Would a user need to rename the columns before adding this transform? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically to map columns in IDataView to inputs in TF model. If the names are kept same we wont have to specify the mapping. The other possibility would be to have overloaded constructor where we can define which column in IDataView maps to which input to TF model using a dictionary.

Right now, yes column needs to be renamed before using this transform.


In reply to: 213780817 [](ancestors = 213780817)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created issue #769 to track this. For now, what Zeeshan said is correct. Perhaps we should add more details here, explaining how to rename data view columns.


In reply to: 213797869 [](ancestors = 213797869,213780817)

Upon success, the transform will introduce a new column in <see cref="IDataView"/> based on the name of the output column specified.
</item>
</list>
</remarks>
Copy link
Contributor

@GalOshri GalOshri Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an explanation of the type of input that is expected? Maybe it is just to clarify that it has to be whatever format the TF model expects, or more detail regarding how images would need to be loaded through a different set of transforms. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a detailed sample?


In reply to: 213782041 [](ancestors = 213782041)

Copy link
Contributor

@GalOshri GalOshri Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but just an explanation of what should be provided. If I'm new to image classification, how do I find out that I need to transform the images to be in the same format that the pretrained TF model expects? Maybe this is not the right place for this though. #Resolved

var shape = tfShapes[i].ToIntArray().Skip(tfShapes[i][0] == -1 ? BatchSize : 0);
if (type.AsVector.DimCount == 1)
{
int valCount = shape.Aggregate((x, y) => x * y);
Copy link
Member Author

@abgoswam abgoswam Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if data is 1d we will verify with the product of dimensions in the model. But if data passed in is multi-dimensional then we verify if each of the individual dimensions match. Is that the intent of this change..

do you think it might be useful to also display the shapes that mismatch ? #Closed

Copy link
Member Author

@abgoswam abgoswam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this

Copy link

@yaeldekel yaeldekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yaeldekel yaeldekel removed the request for review from ericstj August 30, 2018 02:30
@yaeldekel yaeldekel merged commit 5ef7a08 into dotnet:master Aug 30, 2018
@yaeldekel yaeldekel deleted the agoswami/tensorflow branch August 30, 2018 15:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants