TensorFlowMapper transform for scoring Tensorflow models in ML.NET #704

abgoswam · 2018-08-21T21:15:40Z

Fixes #696, #748 #714

This PR creates a new transform 'TensorFlowMapper' for scoring Tensorflow models in ML.NET.

Update to latest dotnet/master

Latest dotnet/master

merge with latest master

…anges.

Ivanidzo4ka · 2018-08-22T05:41:58Z

src/Microsoft.ML.Transforms/Microsoft.ML.Transforms.csproj

@@ -56,6 +57,11 @@
      <AutoGen>True</AutoGen>
      <DependentUpon>Resources.resx</DependentUpon>
    </Compile>
+    <Compile Update="TensorFlow\TensorGeneric.cs">


Shall we move tensorflow to separate project + separate nuget package? #Resolved

Yeah, I think everything that depends on TF (including the TensorFlowTransform) should be in a separate project + package. #Resolved

If you'd like to leave that part to me you can. I can factor it out when I add the TF binaries. #Resolved

sounds good. lets address this as a separate follow up PR.

(marking as Pending for now)

In reply to: 212033224 [](ancestors = 212033224)

Eric will address this as a separate follow up PR

In reply to: 212001912 [](ancestors = 212001912)

yes. we will address this in a separate follow up PR.

In reply to: 211834354 [](ancestors = 211834354)

This is now done.

In reply to: 212037611 [](ancestors = 212037611,211834354)

Merge with latest dotnet/master

…anges.

…hinelearning into agoswami/tensorflow

yaeldekel · 2018-08-22T17:57:03Z

src/Microsoft.ML.Transforms/TensorflowTransform.cs

+                    return new TensorValueGetter<T>(input, colIndex);
+            }
+
+            private ITensorValueGetter CreateTensorValueGetterVec(IRow input, TFDataType tfType, bool isVector, int colIndex, TFShape tfShape)


Vec [](start = 62, length = 3)

We can get rid of this suffix, since there is no other CreateTensorValueGetter method. #Resolved

yaeldekel · 2018-08-22T17:57:55Z

src/Microsoft.ML.Transforms/TensorFlow/Tensor.cs

+namespace Microsoft.ML.Transforms.TensorFlow
+{
+
+	/// <summary>


[](start = 0, length = 1)

Convert tabs to spaces. #Resolved

yaeldekel · 2018-08-22T17:58:11Z

src/Microsoft.ML.Transforms/TensorFlow/Tensorflow.cs

+
+namespace Microsoft.ML.Transforms.TensorFlow
+{
+	internal static partial class NativeBinding


[](start = 0, length = 1)

Convert tabs to spaces. #Resolved

yaeldekel · 2018-08-22T18:03:41Z

src/Microsoft.ML.Transforms/TensorflowTransform.cs

+                        values = new T[OutputColType.VectorSize];
+
+                    TensorflowUtils.FetchData<T>(tensors[0].Data, values);
+                    dst = new VBuffer<T>(values.Length, values);


new VBuffer [](start = 26, length = 14)

Pass dst.Indices to the new VBuffer as well. #Resolved

Ivanidzo4ka · 2018-08-22T21:18:24Z

@dotnet-bot Test OSX10.13 Release #Resolved

abgoswam · 2018-08-22T21:51:05Z

@dotnet-bot Test OSX10.13 Release #Resolved

yaeldekel · 2018-08-22T22:10:47Z

src/Microsoft.ML.Transforms/TensorFlow/TensorflowUtils.cs

+            handle.Free();
+        }
+
+        internal static bool IsTypeSupported(TFDataType tfoutput)


IsTypeSupported [](start = 29, length = 15)

These are for input types, we should decide whether we'd like to support other types as well. #Resolved

Hi Yael.. .. Should we track this as a separate task in the GitHub board we are using ?

In reply to: 212126644 [](ancestors = 212126644)

…uild failure

ericstj · 2018-08-23T00:01:06Z

You don't actually need to push a change to trigger a build. See @dotnet-bot help. #Resolved

eerhardt · 2018-08-23T18:41:40Z

Instead of checking in 25MBs of test model files, can we instead put those in a NuGet package, and pull them from myget.org or something? I don't think we should check in large files into the repo. #Resolved

eerhardt · 2018-08-23T22:00:31Z

src/Microsoft.ML.Transforms/TensorflowTransform.cs

+
+namespace Microsoft.ML.Transforms
+{
+    public static class TensorflowTransform


Do we want to hide "TensorFlow" from the public API? I thought the thinking was to hide the implementation details from the user. Is that no longer a goal? #Resolved

Nevermind, I was thinking of pre-trained featurizers.

In reply to: 212470775 [](ancestors = 212470775)

…hinelearning into agoswami/tensorflow

Zruty0 · 2018-08-29T16:58:25Z

src/Microsoft.ML.TensorFlow/TensorflowTransform.cs

+        public sealed class Arguments : TransformInputBase
+        {
+
+            [Argument(ArgumentType.Required, HelpText = "This is the frozen protobuf model file. Please see https://www.tensorflow.org/mobile/prepare_models for more detail(s).", ShortName = "ModelDir", SortOrder = 0)]


(s) [](start = 172, length = 3)

this is not needed. for details is what I'd suggest #Resolved

Zruty0 · 2018-08-29T16:59:25Z

src/Microsoft.ML.TensorFlow/TensorflowTransform.cs

+        }
+
+        [TlcModule.EntryPoint(Name = "Transforms.TensorFlowScorer", Desc = Summary, UserName = UserName, ShortName = ShortName)]
+        public static CommonOutputs.TransformOutput Convert(IHostEnvironment env, Arguments input)


Convert [](start = 52, length = 7)

Surely not Convert ? #Resolved

eerhardt · 2018-08-29T16:59:27Z

pkg/Microsoft.ML.TensorFlow/Microsoft.ML.TensorFlow.nupkgproj

@@ -0,0 +1,12 @@
+<Project Sdk="Microsoft.NET.Sdk" DefaultTargets="Pack">


@ericstj - We should also have a ".symbols" pkgproj file. That way a symbols package gets produced and is uploaded to the symbols server for the managed assemblies in this package. See the other folders for an example. #Resolved

eerhardt · 2018-08-29T16:59:47Z

pkg/Microsoft.ML.TensorFlow/Microsoft.ML.TensorFlow.nupkgproj

+  </PropertyGroup>
+
+  <ItemGroup>
+    <ProjectRefernce Include="..\Microsoft.ML.TensorFlow.Redist\Microsoft.ML.TensorFlow.Redist.pkgproj" />


Is this a type-o? ProjectRefernce #Resolved

Doh, good catch #Resolved

Zruty0

We didn't define ExtractDirectory on the item and instead had a full path as identity. This worked on linux/osx since it prepended a ""/ to the path, which was tolerated by the file system (an extra leading slash). On Windows this doesn't work or course. Fix by appending to the item that doesn't assume files came from an archive.

ericstj · 2018-08-29T17:28:59Z

Linux tests failed with

Failed   Microsoft.ML.Runtime.RunTests.TestEntryPoints.EntryPointPoissonRegression
2018-08-29T17:03:30.2425699Z Error Message:
2018-08-29T17:03:30.2438651Z  System.FormatException : Stream reading encountered exception
2018-08-29T17:03:30.2453212Z ---- System.IO.FileNotFoundException : Could not find file '/__w/19/s/test/data/external/winequality-white.csv'.
2018-08-29T17:03:30.2467248Z Stack Trace:
2018-08-29T17:03:30.2480707Z    at Microsoft.ML.Runtime.Data.TextLoader.Cursor.LineReader.GetBatch() in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs:line 475
2018-08-29T17:03:30.2504317Z    at Microsoft.ML.Runtime.Data.TextLoader.Cursor.GetSomeLines(IMultiStreamSource source, Int32 count, List`1& lines) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs:line 228
2018-08-29T17:03:30.2518542Z    at Microsoft.ML.Runtime.Data.TextLoader.Bindings..ctor(TextLoader parent, Column[] cols, IMultiStreamSource headerFile, IMultiStreamSource dataSample) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 536
2018-08-29T17:03:30.2532894Z    at Microsoft.ML.Runtime.Data.TextLoader..ctor(IHostEnvironment env, Arguments args, IMultiStreamSource dataSample) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1073
2018-08-29T17:03:30.2546955Z    at Microsoft.ML.Runtime.Data.TextLoader.Create(IHostEnvironment env, Arguments args, IMultiStreamSource files) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1260
2018-08-29T17:03:30.2561108Z ----- Inner Stack Trace -----
2018-08-29T17:03:30.2575935Z    at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
2018-08-29T17:03:30.2589645Z    at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
2018-08-29T17:03:30.2603216Z    at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
2018-08-29T17:03:30.2616853Z    at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
2018-08-29T17:03:30.2630306Z    at Microsoft.ML.Runtime.SimpleFileHandle.OpenReadStream() in /__w/19/s/src/Microsoft.ML.Core/Data/IFileHandle.cs:line 197
2018-08-29T17:03:30.2645127Z    at Microsoft.ML.Runtime.Data.FileHandleSource.Open(Int32 index) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/MultiFileSource.cs:line 94
2018-08-29T17:03:30.2660036Z    at Microsoft.ML.Runtime.Data.FileHandleSource.OpenTextReader(Int32 index) in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/MultiFileSource.cs:line 99
2018-08-29T17:03:30.2673648Z    at Microsoft.ML.Runtime.Data.TextLoader.Cursor.LineReader.ThreadProc() in /__w/19/s/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs:line 491
2018-08-29T17:03:30.2687123Z Standard Output Messages:
2018-08-29T17:03:30.2700414Z  Test EntryPointPoissonRegression: aborted: passed
2018-08-29T17:03:30.2706900Z 
``` #Resolved

…hinelearning into agoswami/tensorflow

GalOshri · 2018-08-29T18:09:17Z

src/Microsoft.ML.TensorFlow/doc.xml

+            The transform currently accepts the <a href="https://www.tensorflow.org/mobile/prepare_models">frozen TensorFlow model</a> file as input.
+          </item>
+          <item>The transform supports scoring only one example at a time.</item>
+          <item>The name of input column(s) should match the name of input(s) in Tensorflow model.</item>


If the input columns have to match the names of the inputs in the TF model, is this parameter needed just to identify which subset of columns should be used? Would a user need to rename the columns before adding this transform? #Resolved

This is basically to map columns in IDataView to inputs in TF model. If the names are kept same we wont have to specify the mapping. The other possibility would be to have overloaded constructor where we can define which column in IDataView maps to which input to TF model using a dictionary.

Right now, yes column needs to be renamed before using this transform.

In reply to: 213780817 [](ancestors = 213780817)

I have created issue #769 to track this. For now, what Zeeshan said is correct. Perhaps we should add more details here, explaining how to rename data view columns.

In reply to: 213797869 [](ancestors = 213797869,213780817)

GalOshri · 2018-08-29T18:13:07Z

src/Microsoft.ML.TensorFlow/doc.xml

+            Upon success, the transform will introduce a new column in <see cref="IDataView"/> based on the name of the output column specified.
+          </item>
+        </list>
+      </remarks>


Should we add an explanation of the type of input that is expected? Maybe it is just to clarify that it has to be whatever format the TF model expects, or more detail regarding how images would need to be loaded through a different set of transforms. #Resolved

Do you mean a detailed sample?

In reply to: 213782041 [](ancestors = 213782041)

No, but just an explanation of what should be provided. If I'm new to image classification, how do I find out that I need to transform the images to be in the same format that the pretrained TF model expects? Maybe this is not the right place for this though. #Resolved

abgoswam · 2018-08-29T18:28:13Z

src/Microsoft.ML.TensorFlow/TensorflowTransform.cs

+                    var shape = tfShapes[i].ToIntArray().Skip(tfShapes[i][0] == -1 ? BatchSize : 0);
+                    if (type.AsVector.DimCount == 1)
+                    {
+                        int valCount = shape.Aggregate((x, y) => x * y);


So if data is 1d we will verify with the product of dimensions in the model. But if data passed in is multi-dimensional then we verify if each of the individual dimensions match. Is that the intent of this change..

do you think it might be useful to also display the shapes that mismatch ? #Closed

abgoswam

Thanks for adding this

yaeldekel

src/Microsoft.ML.TensorFlow/Microsoft.ML.TensorFlow.csproj

test/Microsoft.ML.Tests/Scenarios/TensorflowTests.cs

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

abgoswam and others added 6 commits July 18, 2018 16:16

Merge pull request #1 from dotnet/master

f6baa5b

Update to latest dotnet/master

Merge pull request #2 from dotnet/master

7d0ea81

Latest dotnet/master

Merge pull request #3 from dotnet/master

bad9cd2

merge with latest master

creating dummy file to test permissions. will remove

6b76960

test

085cf6c

TensorFlow scoring, from Zeeshan A.'s branch, with some additional ch…

4175209

…anges.

abgoswam requested review from zeahmed, ericstj, yaeldekel and Zruty0 August 21, 2018 21:16

Ivanidzo4ka reviewed Aug 22, 2018

View reviewed changes

abgoswam and others added 5 commits August 22, 2018 09:29

Merge pull request #4 from dotnet/master

3e7d118

Merge with latest dotnet/master

creating dummy file to test permissions. will remove

30dcdc5

test

b80750b

TensorFlow scoring, from Zeeshan A.'s branch, with some additional ch…

58c703a

…anges.

Merge branch 'agoswami/tensorflow' of https://github.com/abgoswam/mac…

032dc48

…hinelearning into agoswami/tensorflow

yaeldekel reviewed Aug 22, 2018

View reviewed changes

taking care of review comments; build fixes

1f83474

yaeldekel reviewed Aug 22, 2018

View reviewed changes

simple change intended to trigger fresh builds to repro OSX-Release b…

bf7b3a7

…uild failure

ericstj mentioned this pull request Aug 23, 2018

Create a redist package for tensorflowCreate a nuget package that red… #720

Closed

eerhardt reviewed Aug 23, 2018

View reviewed changes

Prevent input tensors from being GC'ed before TF_SessionRun is called

3c0fc04

Merge branch 'agoswami/tensorflow' of https://github.com/abgoswam/mac…

3fea9da

…hinelearning into agoswami/tensorflow

Zruty0 reviewed Aug 29, 2018

View reviewed changes

eerhardt reviewed Aug 29, 2018

View reviewed changes

Zruty0 approved these changes Aug 29, 2018

View reviewed changes

ericstj and others added 3 commits August 29, 2018 10:34

Add symbols package and fix package reference to redist

145db2d

Address pull request comments.

e991fa9

Merge branch 'agoswami/tensorflow' of https://github.com/abgoswam/mac…

228fb24

…hinelearning into agoswami/tensorflow

GalOshri reviewed Aug 29, 2018

View reviewed changes

abgoswam commented Aug 29, 2018

View reviewed changes

Give more details in input dimension mismatch error message.

5a849c4

abgoswam commented Aug 29, 2018

View reviewed changes

Added a test for LearningPipelineAPI and updated the doc.xml

08da76d

yaeldekel approved these changes Aug 30, 2018

View reviewed changes

yaeldekel removed the request for review from ericstj August 30, 2018 02:30

yaeldekel merged commit 5ef7a08 into dotnet:master Aug 30, 2018