Turned TensorFlowEstimator into non-trivial estimator and removed shuffling as part of TensorFlowTransform. #1208

zeahmed · 2018-10-10T00:18:06Z

This PR fixes #1106 and it also fixes #1110.

Converted TensorFlowEstimator into non-trivial estimator because of training feature.
Removed the shuffling of data that was being done internally in TensorFlowTransform (was commented out due to Error due to ShuffleTransform in pipeline. #1106). The idea is to give user more control over the shuffling by allowing them to use the ShuffleTransform in their pipeline explicit. This also reduces the number of parameters in TensorFlowTransform.Argument class. Please see the modified tests for use-cases.

Zruty0

Zruty0 · 2018-10-11T18:36:26Z

src/Microsoft.ML.TensorFlow/TensorflowTransform.cs

-                // https://github.com/dotnet/machinelearning/issues/1106
-                //if(args.Shuffle)
-                //{
-                //    input = new ShuffleTransform(env, new ShuffleTransform.Arguments(), input);


input [](start = 22, length = 5)

frankly, I think the shuffle functionality could be preserved quite easily. You just need to make sure that ShuffleTransform is only applied at training time, and doesn't get applied after training.

So, don't do
input = new Shuffle(input)
but rather
trainData = new Shuffle(input).
#Closed

Then again, I don't know how important it is to shuffle, so I'm fine either way.

In reply to: 224558825 [](ancestors = 224558825)

Not quite sure how helpful it is to remove vs keep it , esp. since tests for it are also commented out.

In reply to: 224558981 [](ancestors = 224558981,224558825)

@pete, Shuffle is important as can be witnessed by the performance of the tests(shuffle vs. non-shuffle). The test dataset is small but on larger datasets performance gains are much better. I have moved the shuffling out of transform to give user more control over the shuffling.

I have done the way you have mentioned but I am currently not sure what the end user scenario will be for transforms like "Shuffle", "Filter" etc.. If this would be the same in Estimator/Pigsty world that would be great then.

@abhishek, code was commented for tests because of error. Now its no longer commented out as I have fixed it.

In reply to: 224568214 [](ancestors = 224568214,224558981,224558825)

This is not what I meant. You can have the same logic (train on shuffle, then apply without shuffle) right here in this code, it doesn't have to be done by the caller.

In reply to: 224573294 [](ancestors = 224573294,224568214,224558981,224558825)

@zeahmed I am not aware of having mentioned anything like that, but I think I agree with @Zruty0 on the issue. #Closed

Ahh I see. That's actually I did not intend to do inside TFTransform. The reason being is that Shuffle has its own parameters e.g. ForceShuffleSeed which I will have to expose as another parameter in TFTransform.Arguments in addition to Shuffle (or may be more depending on usage). But I am fine if you think that's ok to have those parameters as TFTransform.Arguments.

I will change it to what you said unless you agree with my thinking...:)

In reply to: 224594321 [](ancestors = 224594321,224573294,224568214,224558981,224558825)

Sorry wrong Pete...CodeFlow seems to have issue with tagging people.

In reply to: 224607610 [](ancestors = 224607610)

So, I would be OK with either:

Having no 'shuffle' at all, performing shuffling in the test, but still don't make the training constructor public: use estimator to Fit the transform.

Having Shuffle available without extra shuffling params and doing training as I outlined above.

Same, but with additional params to Shuffle also available.

My preference would be to do 2, or 1, but not 3, unless it is obvious that having extra args to Shuffle is actually helpful

In reply to: 224617477 [](ancestors = 224617477,224594321,224573294,224568214,224558981,224558825)

hmm...1 or 2 requires TensorFlowEstimator to be non-trivial estimator then. Currently, TensorFlowEstimator is trivial estimator. I will change the TensorFlowEstimator to non-trivial one and update the PR title and description to reflect changes.

In reply to: 224640105 [](ancestors = 224640105,224617477,224594321,224573294,224568214,224558981,224558825)

Zruty0 · 2018-10-11T20:30:52Z

src/Microsoft.ML.TensorFlow/TensorflowTransform.cs

@@ -290,7 +284,7 @@ public static IDataTransform Create(IHostEnvironment env, Arguments args, IDataV
            return new TensorFlowTransform(env, args, input).MakeDataTransform(input);
        }

-        internal TensorFlowTransform(IHostEnvironment env, Arguments args, IDataView input)
+        public TensorFlowTransform(IHostEnvironment env, Arguments args, IDataView input)


public [](start = 8, length = 6)

This is a mistake. Training constructor should not be public. #Closed

Zruty0

🕐

Zruty0 · 2018-10-12T00:13:36Z

src/Microsoft.ML.TensorFlow/TensorflowTransform.cs

                env.CheckValue(input, nameof(input));

                CheckTrainingParameters(args);

+                if(args.Shuffle)


Shuffle [](start = 24, length = 7)

how is this bypassing the row mapper problem? #Closed

This is happening at training time. The row mapper problem is happening at CreatePredictionEngine time. Please see #1106 for more info.

In reply to: 224640317 [](ancestors = 224640317)

Zruty0

Ivanidzo4ka · 2018-10-23T21:48:55Z

src/Microsoft.ML.TensorFlow/TensorflowTransform.cs

-        public TensorFlowEstimator(IHostEnvironment env, string model, string[] inputs, string[] outputs)
-           : this(env, new TensorFlowTransform(env, TensorFlowUtils.GetSession(env, model), inputs, outputs, TensorFlowUtils.IsSavedModel(env, model) ? model : null, false))
+        private readonly IHost _host;
+        private readonly TensorFlowTransform.Arguments _args;


TensorFlowTransform.Arguments _args [](start = 25, length = 35)

@Zruty0, are you ok with that? Or you ok with check this in, and clean it after?

Let me know if this needs to be done in different way. I will update it.

In reply to: 227577377 [](ancestors = 227577377)

Ivanidzo4ka

zeahmed added 2 commits October 9, 2018 16:26

Removed shuffling as part of TensorFlowTransform.

9c8a3f0

Updated changes that were lost due to merge.

3c16a9a

zeahmed requested review from Ivanidzo4ka, Zruty0, abgoswam and yaeldekel October 10, 2018 00:18

Zruty0 approved these changes Oct 11, 2018

View reviewed changes

Zruty0 reviewed Oct 11, 2018

View reviewed changes

Addressed reviewers' comments.

8e8d2cd

Zruty0 reviewed Oct 11, 2018

View reviewed changes

Zruty0 suggested changes Oct 11, 2018

View reviewed changes

zeahmed added 2 commits October 11, 2018 15:26

Reverting to applying shuffle inside TFTransform.

7f40580

Resolved conflicts.

d539865

Zruty0 reviewed Oct 12, 2018

View reviewed changes

Turned TensorFlowEstimator to non-trivial trainable estimator.

9c70e74

zeahmed changed the title ~~Removed shuffling from TensorFlowTransform.~~ Turned TensorFlowEstimator into non-trivial estimator and removed shuffling as part of TensorFlowTransform. Oct 12, 2018

shauheen assigned zeahmed and abgoswam Oct 18, 2018

Zruty0 approved these changes Oct 23, 2018

View reviewed changes

Ivanidzo4ka reviewed Oct 23, 2018

View reviewed changes

Ivanidzo4ka approved these changes Oct 25, 2018

View reviewed changes

zeahmed merged commit c726f7f into dotnet:master Oct 25, 2018

ghost locked as resolved and limited conversation to collaborators Mar 28, 2022

Turned TensorFlowEstimator into non-trivial estimator and removed shuffling as part of TensorFlowTransform. #1208

Turned TensorFlowEstimator into non-trivial estimator and removed shuffling as part of TensorFlowTransform. #1208

Uh oh!

Conversation

zeahmed commented Oct 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zruty0 left a comment

Choose a reason for hiding this comment

Uh oh!

Zruty0 Oct 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abgoswam Oct 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pete Oct 11, 2018 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zruty0 Oct 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zruty0 left a comment

Choose a reason for hiding this comment

Uh oh!

Zruty0 Oct 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zruty0 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zeahmed commented Oct 10, 2018 •

edited

Loading

Zruty0 Oct 11, 2018 •

edited

Loading

abgoswam Oct 11, 2018 •

edited

Loading

pete Oct 11, 2018 •

edited by zeahmed

Loading

Zruty0 Oct 11, 2018 •

edited

Loading

Zruty0 Oct 12, 2018 •

edited

Loading