swapping the order or arguments on the constructors of the ConversionExtensionsCatalog. Internalizing the constructors. #2118

sfilipi · 2019-01-11T00:30:32Z

Working towards resolving #2064 by swapping the order or arguments on the constructors of the ConversionExtensions Catalog. Internalizing the constructors.

sfilipi · 2019-01-11T00:32:29Z

test/Microsoft.ML.Tests/Scenarios/TensorflowTests.cs

@@ -39,7 +39,7 @@ public void TensorFlowTransforCifarEndToEndTest()
                    .Append(new ImagePixelExtractingEstimator(mlContext, "ImageCropped", "Input", interleave: true))
                    .Append(new TensorFlowEstimator(mlContext, model_location, new[] { "Input" }, new[] { "Output" }))
                    .Append(new ColumnConcatenatingEstimator(mlContext, "Features", "Output"))
-                    .Append(new ValueToKeyMappingEstimator(mlContext, "Label"))
+                    .Append(new ValueToKeyMappingEstimator(mlContext, "Label", "Label"))


, "Label" [](start = 77, length = 9)

side effect of this change is that we cannot have the output column default to null, and therefore substitute it with the input column when not provided... #Resolved

Let's reverse that. We should retain the behavior and attitude that existed before the fiasco began, of the "name" (what was misidentified as "output") always being specified, and in the case where there is a single "source" (what was misidentified as "input"), it is taken as being the same as the "name."

In reply to: 246970621 [](ancestors = 246970621)

This was marked as resolved, but I still see that we've retained this change.

In reply to: 246974415 [](ancestors = 246974415,246970621)

Ivanidzo4ka · 2019-01-11T00:35:22Z

src/Microsoft.ML.Data/Transforms/Hashing.cs

        /// <param name="hashBits">Number of bits to hash into. Must be between 1 and 31, inclusive.</param>
        /// <param name="invertHash">During hashing we constuct mappings between original values and the produced hash values.
        /// Text representation of original values are stored in the slot names of the  metadata for the new column.Hashing, as such, can map many initial values to one.
        /// <paramref name="invertHash"/> specifies the upper bound of the number of distinct input values mapping to a hash that should be retained.
        /// <value>0</value> does not retain any input values. <value>-1</value> retains all input values mapping to each hash.</param>
-        public HashingEstimator(IHostEnvironment env, string inputColumn, string outputColumn = null,
+        internal HashingEstimator(IHostEnvironment env, string inputColumn, string outputColumn,


internal [](start = 8, length = 8)

why estimator constructor becomes internal? #Resolved

and shouldn't you swap output and input?

In reply to: 246971081 [](ancestors = 246971081)

Constructors become internal, so that users instantiate them only through the MlContext. See #2100

In reply to: 246971081 [](ancestors = 246971081)

Ivanidzo4ka · 2019-01-11T00:40:06Z

src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs

@@ -92,24 +92,24 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co
        /// <param name="outputColumn">The name of the output column.</param>
        /// <param name="bag">Whether bagging is used for the conversion. </param>
        public static KeyToVectorMappingEstimator MapKeyToVector(this TransformsCatalog.ConversionTransforms catalog,
-            string inputColumn, string outputColumn = null, bool bag = KeyToVectorMappingEstimator.Defaults.Bag)
-            => new KeyToVectorMappingEstimator(CatalogUtils.GetEnvironment(catalog), inputColumn, outputColumn, bag);
+            string outputColumn, string inputColumn, bool bag = KeyToVectorMappingEstimator.Defaults.Bag)


inputColumn [](start = 40, length = 11)

I'm not a Tom, and I can't speak from his name, but I don't remember him objecting on idea of one of column be optional, and I thinks it's perfectly fine to have second column nullable and replace it with first one. #Resolved

Yes, in fact that was our prior behavior.

In reply to: 246971850 [](ancestors = 246971850)

Ivanidzo4ka · 2019-01-11T00:48:10Z

        => new ValueToKeyMappingEstimator(CatalogUtils.GetEnvironment(catalog), columns, file, termsColumn, loaderFactory);

I don't think this constructor and constructor for estimator which accepts term column and loader should exist. (you can just remove term column and loader)
First because we don't want to expose IDataLoader interface and IComponentFactory to user.
Second because we already have ValueMapping estimator which allow you provide custom mappings and just do mappings from file is subset of that functionality. #Pending

Refers to: src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs:127 in c6961d9. [](commit_id = c6961d9, deletion_comment = False)

TomFinley · 2019-01-11T00:54:22Z

src/Microsoft.ML.Data/Transforms/KeyToVector.cs

            : this(env, new KeyToVectorMappingTransformer(env, columns))
        {
        }

-        public KeyToVectorMappingEstimator(IHostEnvironment env, string inputColumn, string outputColumn = null, bool bag = Defaults.Bag)
-            : this(env, new KeyToVectorMappingTransformer(env, new KeyToVectorMappingTransformer.ColumnInfo(inputColumn, outputColumn ?? inputColumn, bag)))
+        internal KeyToVectorMappingEstimator(IHostEnvironment env, string outputColumn, string inputColumn, bool bag = Defaults.Bag)


outputColumn, string inputColumn [](start = 74, length = 32)

outputColumn, string inputColumn [](start = 74, length = 32)

Thank you @sfilipi! It was in a note somewhat down on the issue, but in a reply to @@justinormont where we were talking about consistency with existing argument names, I also observed that the names "inputs" and "outputs" are incorrect (since the inputs to outputs to estimators/transforms are not columns, but IDataViews), and should retain something closer the existing names we have for this name and source. Indeed, I think the bad naming was part of the reason some people got so badly confused in the first place. #Resolved

But these ones are InputColumn and OutputColumn and estimator operate over columns in IDataView. What is wrong with them? And why name and source and not destination and source? name is such a broad term to use it.

In reply to: 246974093 [](ancestors = 246974093)

I hope i am not annoying and further fueling the conversation, but sharing my personal opinion:

1- I agree that the order of arguments should be swapped, and have destination, than source.
2- I find the name "name" for the output column confusing too. Yes it does make sense in the scenario, but you have to blink about 4 times before it does so; just because, as Ivan says, it is such a generic thing.
The names InputColumn, and OutputColumn make a lot more sense to me too, since they are actually column names..

The end!

In reply to: 246974845 [](ancestors = 246974845,246974093)

The term name and source is a well established name in this codebase. The concept of the input and output for a transform is already being established as being the dataviews themselves. Transforms do not take columns as inputs and outputs, they take data views as inputs and outputs -- that's obvious from the interface specification. So: let's jsut agree, the dataviews are the inputs and outputs, full stop. In the case where a transform happens to be producing an additional column, we have parameterized the transform as producing a column with that "name." And when that parameterization also happens to pay attention to one column, we often call that a "source." Always.

It's interesting that you have a different interpretation, and I'd like to emphasize that I don't discount it, but it is distinct from how we've architected the old IDataTransform and I don't like the new interpretation for the reasons already discussed at length.

Going forward, I think this shows how dangerous it is to introduce a new interpretation without discussing that new interpretation with anyone, discussing its merits, etc. I'm sorry that was done, but not sorry that we're fixing it.

In reply to: 247204046 [](ancestors = 247204046,246974845,246974093)

Transforms do not take columns as inputs and outputs, they take data views as inputs and outputs

Estimators during it construction doesn't take any dataview, they take columns which expected to be in dataview. We don't touch dataview till we call Fit. We do schema propagation for GetOutputSchema on SchemaShape.Column,
Only thing which take data views as inputs as far as I know is AppendDataViewRows transform.

The term name and source is a well established name in this codebase.

Internally, yes, mostly as legacy for command line. Legacy doesn't mean it's a great and useful for another task (Public Api).

let's jsut agree, the dataviews are the inputs and outputs, full stop.

I'm as user, don't see input columns and output columns in estimators as input and output dataviews. I think @sfilipi has the same opinion. I can't even come up with estimator/transformer which would produce multiple dataviews (maybe evaluator with metrics as dataviews)

So far we have two people who think name and source is not a great idea (especially name)
We do it not because we want to make you mad, we do it because we also care about project.

without discussing that new interpretation with anyone

Can we actually have discussion regarding name and source vs inputColumn and outputColumn ? I would love to hear other people opinion, don't you?

Estimators during it construction doesn't take any dataview, they take columns which expected to be in dataview. We don't touch dataview till we call Fit.

If you read what I wrote, you'll see I was explicitly talking about transformers, but estimators are similar. They don't take inputs and outputs as columns. They take inputs as dataviews (as you say), and they the output of that is a transformer. This, again, already covered in #2064.

We do schema propagation for GetOutputSchema on SchemaShape.Column,

You're mistaken. Check the IEstimator interface again.

machinelearning/src/Microsoft.ML.Core/Data/IEstimator.cs

Line 301 in 6b9f589

SchemaShape GetOutputSchema(SchemaShape inputSchema);

As you see, again, inputs and outputs to this process are not columnar, they in fact are more global than this.

Internally, yes, mostly as legacy for command line. Legacy doesn't mean it's a great and useful for another task (Public Api).

Already covered in #2064... to summarize, you must justify inconsistencies. We should try to minimize inconsistencies. Just pointing out that sometimes we've judged it healthy that the two should be unaligned because they're in different settings is not a free pass to introduce whatever inconsistencies you want. Not only was this change unjustified, it has the flaws that I have identified.

Can we actually have discussion regarding name and source vs inputColumn and outputColumn ? I would love to hear other people opinion, don't you?

At some point, as again established in #2064, one of my responsibilities (which, just to emphasize, I do not enjoy) is to judge when discussions cease to become productive, so that we can actually move on with the work that must be done. While I'd love to have the luxury of enough free time to just endlessly circle around things like this, we can't. Let's move on.

In reply to: 248024251 [](ancestors = 248024251)

TomFinley · 2019-01-11T00:57:43Z

src/Microsoft.ML.Data/Transforms/ValueToKeyMappingEstimator.cs

-        public ValueToKeyMappingEstimator(IHostEnvironment env, string inputColumn, string outputColumn = null, int maxNumTerms = Defaults.MaxNumTerms, ValueToKeyMappingTransformer.SortOrder sort = Defaults.Sort) :
-           this(env, new [] { new ValueToKeyMappingTransformer.ColumnInfo(inputColumn, outputColumn ?? inputColumn, maxNumTerms, sort) })
+        internal ValueToKeyMappingEstimator(IHostEnvironment env, string outputColumn, string inputColumn, int maxNumTerms = Defaults.MaxNumTerms, ValueToKeyMappingTransformer.SortOrder sort = Defaults.Sort) :
+           this(env, new [] { new ValueToKeyMappingTransformer.ColumnInfo(inputColumn, outputColumn, maxNumTerms, sort) })


inputColumn, outputColumn [](start = 74, length = 25)

It would be well to make sure we're consistent everywhere, and also swap these. #Resolved

sfilipi · 2019-01-13T08:07:58Z

    public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.ConversionTransforms catalog, params (string input, string output)[] columns)

I will fix this in the next PR here, because this goes all the way down to the OneToOneTransformerBase, and that change will require touching everything .. #Pending

Refers to: src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs:75 in c6961d9. [](commit_id = c6961d9, deletion_comment = False)

sfilipi · 2019-01-13T08:20:48Z

        => new ValueToKeyMappingEstimator(CatalogUtils.GetEnvironment(catalog), columns, file, termsColumn, loaderFactory);

Left the ctor, because it is used through the codebase; it;s internal. Removed the reference here.

In reply to: 453322670 [](ancestors = 453322670)

Refers to: src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs:127 in c6961d9. [](commit_id = c6961d9, deletion_comment = False)

TomFinley · 2019-01-15T21:54:12Z

src/Microsoft.ML.Data/Transforms/Hashing.cs

-            /// <param name="input">Name of input column.</param>
-            /// <param name="output">Name of the column resulting from the transformation of <paramref name="input"/>. Null means <paramref name="input"/> is replaced.</param>
+            /// <param name="source">Name of column to work on.</param>
+            /// <param name="name">Name of the column resulting from the transformation of <paramref name="source"/>. Null means <paramref name="source"/> is replaced.</param>


Null means is replaced [](start = 118, length = 48)

Other way around. null on source means this. Also FYI, while I'm pleased the see that the parameters themselves are in the right order, the params tags themselves are not reversed. #Resolved

TomFinley · 2019-01-15T21:54:25Z

src/Microsoft.ML.Data/Transforms/Hashing.cs

-            public readonly string Input;
-            public readonly string Output;
+            public readonly string Source;
+            public readonly string Name;
            public readonly int HashBits;


While we're at it, might want to put these in the right order. #Resolved

TomFinley · 2019-01-15T21:56:12Z

src/Microsoft.ML.Data/Transforms/TypeConverting.cs

            public readonly DataKind OutputKind;
            public readonly KeyRange OutputKeyRange;

            /// <summary>
            /// Describes how the transformer handles one column pair.
            /// </summary>
-            /// <param name="input">Name of input column.</param>
-            /// <param name="output">Name of output column.</param>
+            /// <param name="source">Name of input column.</param>


Name of input column [](start = 37, length = 20)

Similar comment to other file about ordering of tags, description, etc. #Resolved

TomFinley · 2019-01-15T21:56:24Z

src/Microsoft.ML.Data/Transforms/TypeConverting.cs

        /// <param name="outputKind">The expected type of the converted column.</param>
        /// <param name="outputKeyRange">New key range if we work with key type.</param>
-        public TypeConvertingTransformer(IHostEnvironment env, string inputColumn, string outputColumn, DataKind outputKind, KeyRange outputKeyRange = null)
-            : this(env, new ColumnInfo(inputColumn, outputColumn, outputKind, outputKeyRange))
+        public TypeConvertingTransformer(IHostEnvironment env, string source, string name, DataKind outputKind, KeyRange outputKeyRange = null)


string source, string name [](start = 63, length = 26)

This is not in proper order. #Pending

I will tackle the transformers in the next pr, because i have to change all of them, they all derive from OneToOneTransformer.

Agree, it is confusing because I did partial work, on renaming the args.

In reply to: 248078355 [](ancestors = 248078355)

TomFinley · 2019-01-15T21:59:59Z

src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs

@@ -141,7 +130,7 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co
            this TransformsCatalog.ConversionTransforms catalog,
            IEnumerable<TInputType> keys,
            IEnumerable<TOutputType> values,
-            params (string source, string name)[] columns)
+            params (string inputColumn, string outputColumn)[] columns)


inputColumn, string outputColumn [](start = 27, length = 32)

Remember, name, source, not source, name. And definitely not input, output. :) #Pending

Also I think for conistency sake we want the column mapping to be at the head, even when we have to forgo the convenience of the params.

In reply to: 248079402 [](ancestors = 248079402)

I will do this on the next PR, because the tuple is passed directly from the estimator -> transfomer-> OneToOnetransformer base class.

I agree that it is confusing to review, because i did partial work with renaming the tuple items.

In reply to: 248079667 [](ancestors = 248079667,248079402)

TomFinley · 2019-01-15T22:01:50Z

src/Microsoft.ML.Data/Transforms/Hashing.cs

        /// <param name="hashBits">Number of bits to hash into. Must be between 1 and 31, inclusive.</param>
        /// <param name="invertHash">During hashing we constuct mappings between original values and the produced hash values.
        /// Text representation of original values are stored in the slot names of the  metadata for the new column.Hashing, as such, can map many initial values to one.
        /// <paramref name="invertHash"/> specifies the upper bound of the number of distinct input values mapping to a hash that should be retained.
        /// <value>0</value> does not retain any input values. <value>-1</value> retains all input values mapping to each hash.</param>
-        public HashingEstimator(IHostEnvironment env, string inputColumn, string outputColumn = null,
+        internal HashingEstimator(IHostEnvironment env, string outputColumn, string inputColumn = null,


outputColumn [](start = 63, length = 12)

Another remnant of the bad input/output names. Also another case where the XML comments should be made less awkward, as noted elsewhere. #Resolved

TomFinley · 2019-01-15T22:21:48Z

src/Microsoft.ML.Data/Transforms/KeyToVector.cs

            : this(env, new KeyToVectorMappingTransformer(env, columns))
        {
        }

-        public KeyToVectorMappingEstimator(IHostEnvironment env, string inputColumn, string outputColumn = null, bool bag = Defaults.Bag)
-            : this(env, new KeyToVectorMappingTransformer(env, new KeyToVectorMappingTransformer.ColumnInfo(inputColumn, outputColumn ?? inputColumn, bag)))
+        internal KeyToVectorMappingEstimator(IHostEnvironment env, string outputColumn, string inputColumn = null, bool bag = Defaults.Bag)


outputColumn, string inputColumn [](start = 74, length = 32)

Another name, source... #Resolved

TomFinley · 2019-01-15T22:22:17Z

src/Microsoft.ML.Data/Transforms/TypeConverting.cs

-        public TypeConvertingEstimator(IHostEnvironment env,
-            string inputColumn, string outputColumn = null,
+        internal TypeConvertingEstimator(IHostEnvironment env,
+            string outputColumn, string inputColumn = null,


outputColumn, string inputColumn [](start = 19, length = 32)

Here's another. I'm just going to stop commenting on these, maybe self-review, double check them? #Resolved

TomFinley · 2019-01-15T22:22:40Z

src/Microsoft.ML.Data/Transforms/ValueToKeyMappingEstimator.cs

-        /// <param name="inputColumn">Name of the column to be transformed.</param>
-        /// <param name="outputColumn">Name of the output column. If this is null '<paramref name="inputColumn"/>' will be used.</param>
+        /// <param name="source">Name of the column to be transformed.</param>
+        /// <param name="name">Name of the column produced.</param>


/// Name of the column produced. [](start = 8, length = 59)

Reverse <param tags. #Resolved

TomFinley · 2019-01-15T22:26:54Z

src/Microsoft.ML.Data/Transforms/ValueToKeyMappingEstimator.cs

-        /// <param name="inputColumn">Name of the column to be transformed.</param>
-        /// <param name="outputColumn">Name of the output column. If this is null '<paramref name="inputColumn"/>' will be used.</param>
+        /// <param name="source">Name of the column to be transformed.</param>
+        /// <param name="name">Name of the column produced.</param>


Name of the column produced [](start = 31, length = 27)

To my eyes, this phrasing is awkward. What was wrong with the existing descriptors we already used for this in existing tooling?

machinelearning/src/Microsoft.ML.Data/Transforms/ColumnBindingsBase.cs

Lines 16 to 22 in 6b9f589

public abstract class SourceNameColumnBase

{

[Argument(ArgumentType.AtMostOnce, HelpText = "Name of the new column", ShortName = "name")]

public string Name;

[Argument(ArgumentType.AtMostOnce, HelpText = "Name of the source column", ShortName = "src")]

public string Source;

Maybe use that, with appropriate clarifications about what happens when source is null, and so on. #Resolved

This BTW is a general comment, I see lots of inconsistency in our documentation. Including descriptions of these columns as somehow being "inputs" and "outputs" which, as we've seen direct examples of, leads people to think about these structures in the wrong way.

In reply to: 248087222 [](ancestors = 248087222)

TomFinley · 2019-01-15T22:30:39Z

test/Microsoft.ML.Benchmarks/RffTransform.cs

@@ -43,7 +43,7 @@ public void CV_Multiclass_Digits_RffTransform_OVAAveragedPerceptron()
            var pipeline = mlContext.Transforms.Projection.CreateRandomFourierFeatures("Features", "FeaturesRFF")
            .AppendCacheCheckpoint(mlContext)
            .Append(mlContext.Transforms.Concatenate("Features", "FeaturesRFF"))
-            .Append(new ValueToKeyMappingEstimator(mlContext, "Label"))
+            .Append(new ValueToKeyMappingEstimator(mlContext, "Label", "Label"))


, "Label" [](start = 69, length = 9)

Now that the source column has a default value again, from what I see, we can get rid of this, right? #Resolved

TomFinley · 2019-01-15T22:31:01Z

test/Microsoft.ML.Tests/Scenarios/Api/Estimators/Metacomponents.cs

@@ -27,7 +27,7 @@ public void Metacomponents()
            var sdcaTrainer = ml.BinaryClassification.Trainers.StochasticDualCoordinateAscent("Label", "Features", advancedSettings: (s) => { s.MaxIterations = 100; s.Shuffle = true; s.NumThreads = 1; });

            var pipeline = new ColumnConcatenatingEstimator (ml, "Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
-                .Append(new ValueToKeyMappingEstimator(ml, "Label"), TransformerScope.TrainTest)
+                .Append(new ValueToKeyMappingEstimator(ml, "Label", "Label"), TransformerScope.TrainTest)


, "Label" [](start = 66, length = 9)

Similar here. Unneeded, or should be. #Resolved

TomFinley · 2019-01-15T22:31:17Z

test/Microsoft.ML.Tests/Scenarios/Api/Estimators/Extensibility.cs

@@ -39,7 +39,7 @@ void Extensibility()
            };
            var pipeline = new ColumnConcatenatingEstimator (ml, "Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
                .Append(new CustomMappingEstimator<IrisData, IrisData>(ml, action, null), TransformerScope.TrainTest)
-                .Append(new ValueToKeyMappingEstimator(ml, "Label"), TransformerScope.TrainTest)
+                .Append(new ValueToKeyMappingEstimator(ml, "Label", "Label"), TransformerScope.TrainTest)


, "Label" [](start = 66, length = 9)

Same, unneeded. #Resolved

TomFinley · 2019-01-15T22:31:26Z

test/Microsoft.ML.Tests/Scenarios/Api/Estimators/DecomposableTrainAndPredict.cs

@@ -30,7 +30,7 @@ void DecomposableTrainAndPredict()
            var data = ml.Data.ReadFromTextFile<IrisData>(dataPath, separatorChar: ',');

            var pipeline = new ColumnConcatenatingEstimator (ml, "Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
-                .Append(new ValueToKeyMappingEstimator(ml, "Label"), TransformerScope.TrainTest)
+                .Append(new ValueToKeyMappingEstimator(ml, "Label", "Label"), TransformerScope.TrainTest)


, "Label" [](start = 66, length = 9)

Same, unneeded. #Resolved

TomFinley · 2019-01-15T22:32:42Z

        public ColumnInfo(string input, string output,

This one was missed. #Pending

Refers to: src/Microsoft.ML.Transforms/OneHotHashEncoding.cs:230 in 520cb79. [](commit_id = 520cb79, deletion_comment = False)

TomFinley · 2019-01-15T22:35:34Z

            var binaryCols = new List<(string input, string output)>();

While less essential since this is a field in an method that is not even privately visible, we've seen the mischief caused by this conflation of "input" and "output" with things that are neither inputs nor otuputs, and the ordering has all sorts of confusing far reaching consequences by confusing some people. Consider making this consistent. #Pending

Refers to: src/Microsoft.ML.Transforms/OneHotHashEncoding.cs:273 in 520cb79. [](commit_id = 520cb79, deletion_comment = False)

TomFinley · 2019-01-15T22:36:07Z

            var binaryCols = new List<(string input, string output)>();

Obviously far less essential than getting the public surface area right.

In reply to: 454578490 [](ancestors = 454578490)

Refers to: src/Microsoft.ML.Transforms/OneHotHashEncoding.cs:273 in 520cb79. [](commit_id = 520cb79, deletion_comment = False)

TomFinley · 2019-01-15T22:36:46Z

        string outputColumn = null, OneHotEncodingTransformer.OutputKind outputKind = Defaults.OutKind)

Another one missed. #Pending

Refers to: src/Microsoft.ML.Transforms/OneHotEncoding.cs:218 in 520cb79. [](commit_id = 520cb79, deletion_comment = False)

TomFinley · 2019-01-15T22:39:04Z

            IReadOnlyDictionary<PipelineColumn, string> outputNames,

Hmmm. I wonder if I indirectly bear some fault for the confusion, since in my static API, I do consider the operations to be on columns, as opposed to IEstimator/ITransformer which considers operations to be on dataviews/schemas and whatnot. Not something to change, just something that amused me. #Closed

Refers to: src/Microsoft.ML.StaticPipe/TransformsStatic.cs:578 in 520cb79. [](commit_id = 520cb79, deletion_comment = False)

TomFinley · 2019-01-15T22:41:47Z

src/Microsoft.ML.Data/Transforms/KeyToVector.cs

-            public readonly string Input;
-            public readonly string Output;
+            public readonly string Source;
+            public readonly string Name;


Name first. #Resolved

TomFinley · 2019-01-15T22:42:19Z

src/Microsoft.ML.Data/Transforms/KeyToVector.cs

@@ -114,10 +114,10 @@ public ColumnInfo(string input, string output = null, bool bag = KeyToVectorMapp
        public IReadOnlyCollection<ColumnInfo> Columns => _columns.AsReadOnly();
        private readonly ColumnInfo[] _columns;

-        private static (string input, string output)[] GetColumnPairs(ColumnInfo[] columns)
+        private static (string source, string name)[] GetColumnPairs(ColumnInfo[] columns)


source, string name [](start = 31, length = 19)

Probably best to make consistent as we're trying to do elsewhere, even if it is private. #Resolved

I will address it in my next PR, because they get used in the Transformer, which passes it to the base class OneToOnetransformer. Once that changes everything needs to change.

In reply to: 248091617 [](ancestors = 248091617)

TomFinley · 2019-01-15T22:56:03Z

docs/samples/Microsoft.ML.Samples/Dynamic/KeyToValue_Term.cs

@@ -31,16 +31,16 @@ public static void KeyToValue_Term()
            // making use of default settings.
            string defaultColumnName = "DefaultKeys";
            // REVIEW create through the catalog extension
-            var default_pipeline = new WordTokenizingEstimator(ml, "Review")
-                .Append(new ValueToKeyMappingEstimator(ml, "Review", defaultColumnName));
+            var default_pipeline = mlContext.Transforms.Text.TokenizeWords("Review")


default_pipeline [](start = 16, length = 16)

Ugh. Our samples have been written with Python variable naming conventions, not C# ones? I'll have to file an issue about this... #2155. #WontFix

TomFinley · 2019-01-15T22:56:35Z

test/Microsoft.ML.Tests/TrainerEstimators/MetalinearEstimators.cs

@@ -76,7 +76,7 @@ public void MetacomponentsFeaturesRenamed()

            var sdcaTrainer = new SdcaBinaryTrainer(Env, "Label", "Vars", advancedSettings: (s) => { s.MaxIterations = 100; s.Shuffle = true; s.NumThreads = 1; });
            var pipeline = new ColumnConcatenatingEstimator(Env, "Vars", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
-                .Append(new ValueToKeyMappingEstimator(Env, "Label"), TransformerScope.TrainTest)
+                .Append(new ValueToKeyMappingEstimator(Env, "Label", "Label"), TransformerScope.TrainTest)


, "Label" [](start = 67, length = 9)

Another unnecessary. #Resolved

TomFinley · 2019-01-15T22:56:47Z

test/Microsoft.ML.Tests/TrainerEstimators/TrainerEstimators.cs

@@ -193,7 +193,7 @@ private TextLoader.Arguments GetIrisLoaderArgs()
                        }
            }).Read(GetDataPath(IrisDataPath));

-            var pipeline = new ValueToKeyMappingEstimator(Env, "Label");
+            var pipeline = new ValueToKeyMappingEstimator(Env, "Label", "Label");


, "Label" [](start = 70, length = 9)

Another unnecessary. #Resolved

TomFinley · 2019-01-15T22:57:47Z

test/Microsoft.ML.Tests/Transformers/TextFeaturizerTests.cs

@@ -249,7 +249,7 @@ public void NgramWorkout()
                .Read(sentimentDataPath);

            var est = new WordTokenizingEstimator(Env, "text", "text")
-                .Append(new ValueToKeyMappingEstimator(Env, "text", "terms"))
+                .Append(new ValueToKeyMappingEstimator(Env, "terms", "text"))
                .Append(new NgramExtractingEstimator(Env, "terms", "ngrams"))


NgramExtractingEstimator [](start = 28, length = 24)

Hmmm. This didn't get fixed? #Pending

I am not converting the NGramExtracting Estimator on this PR. It will come in the next PR.

In reply to: 248095484 [](ancestors = 248095484)

TomFinley · 2019-01-15T22:57:53Z

test/Microsoft.ML.Tests/Transformers/TextFeaturizerTests.cs

@@ -249,7 +249,7 @@ public void NgramWorkout()
                .Read(sentimentDataPath);

            var est = new WordTokenizingEstimator(Env, "text", "text")
-                .Append(new ValueToKeyMappingEstimator(Env, "text", "terms"))
+                .Append(new ValueToKeyMappingEstimator(Env, "terms", "text"))
                .Append(new NgramExtractingEstimator(Env, "terms", "ngrams"))
                .Append(new NgramHashingEstimator(Env, "terms", "ngramshash"));


NgramHashingEstimator [](start = 28, length = 21)

Nor this? #Pending

I am not converting the NGramHashing Estimator on this PR. It will come in the next PR.

In reply to: 248095509 [](ancestors = 248095509)

TomFinley · 2019-01-15T22:58:17Z

        public ColumnInfo(string input, string output = null,

Another. #Pending

Refers to: src/Microsoft.ML.Transforms/OneHotEncoding.cs:192 in 520cb79. [](commit_id = 520cb79, deletion_comment = False)

sfilipi · 2019-01-16T17:11:46Z

        public ColumnInfo(string input, string output = null,

I did write a comment that i am not swapping for the Transformers on this PR. the column info is shared between the estimators and the transformers.

The reason I am not swapping for the transformers, is because they all pass the columns to base - OneToOneTransformer; and if you swap them in there, you have to do all the transforms. This was meant to be a small PR to just test understanding of the changes, reach agreement. I am tackling all of them in the next PR.

In reply to: 454584468 [](ancestors = 454584468)

Refers to: src/Microsoft.ML.Transforms/OneHotEncoding.cs:192 in 520cb79. [](commit_id = 520cb79, deletion_comment = False)

sfilipi · 2019-01-16T17:12:47Z

        string outputColumn = null, OneHotEncodingTransformer.OutputKind outputKind = Defaults.OutKind)

I am not doing all the estimators in this PR; just the one for the ConversionsExtensionsCatalog. The reason this file is here, is because of the changes in KeyToVector.

In reply to: 454578807 [](ancestors = 454578807)

Refers to: src/Microsoft.ML.Transforms/OneHotEncoding.cs:218 in 520cb79. [](commit_id = 520cb79, deletion_comment = False)

sfilipi · 2019-01-16T17:14:06Z

        public ColumnInfo(string input, string output,

I am not doing the OneHotHash in the PR. The changes below are for Hashing.

In reply to: 454577719 [](ancestors = 454577719)

Refers to: src/Microsoft.ML.Transforms/OneHotHashEncoding.cs:230 in 520cb79. [](commit_id = 520cb79, deletion_comment = False)

…xtensionsCatalog. Internalizing the constructors.

sfilipi · 2019-01-16T18:27:06Z

            var binaryCols = new List<(string input, string output)>();

I will do it in the next PR, where the order gets swapped on the OneHotEncodingTransformer.

This PR has not addressed that component.

In reply to: 454578613 [](ancestors = 454578613,454578490)

Refers to: src/Microsoft.ML.Transforms/OneHotHashEncoding.cs:273 in 520cb79. [](commit_id = 520cb79, deletion_comment = False)

sfilipi · 2019-01-18T06:25:06Z

@glebuk, @TomFinley, @Ivanidzo4ka would you have time for a second look? Thanks.

sfilipi · 2019-01-25T01:09:08Z

Closing this since the work got also done in: #2239

sfilipi requested review from Ivanidzo4ka, TomFinley and glebuk January 11, 2019 00:30

sfilipi added the API Issues pertaining the friendly API label Jan 11, 2019

sfilipi added this to the 0119 milestone Jan 11, 2019

sfilipi self-assigned this Jan 11, 2019

sfilipi commented Jan 11, 2019

View reviewed changes

Ivanidzo4ka reviewed Jan 11, 2019

View reviewed changes

TomFinley reviewed Jan 11, 2019

View reviewed changes

sfilipi force-pushed the trasformerEstimatorInputOutputSwap branch from c6961d9 to 0064b05 Compare January 13, 2019 08:24

TomFinley reviewed Jan 15, 2019

View reviewed changes

sfilipi added 5 commits January 16, 2019 09:16

swappng the order or arguments on the constructors of the ConversionE…

ab70ee0

…xtensionsCatalog. Internalizing the constructors.

Addressing PR comments

07fba20

Fixing the Ngram workout

88ff284

Fixes identified by tests

8a3cd83

renamed outputColumn -> name and inputColumn -> source

e27b453

Addressing the PR comments.

60ab6cd

sfilipi force-pushed the trasformerEstimatorInputOutputSwap branch from 520cb79 to 60ab6cd Compare January 16, 2019 18:28

sfilipi closed this Jan 25, 2019

sfilipi deleted the trasformerEstimatorInputOutputSwap branch January 25, 2019 01:09

ghost locked as resolved and limited conversation to collaborators Mar 25, 2022

	public abstract class SourceNameColumnBase
	{
	[Argument(ArgumentType.AtMostOnce, HelpText = "Name of the new column", ShortName = "name")]
	public string Name;

	[Argument(ArgumentType.AtMostOnce, HelpText = "Name of the source column", ShortName = "src")]
	public string Source;

swapping the order or arguments on the constructors of the ConversionExtensionsCatalog. Internalizing the constructors. #2118

swapping the order or arguments on the constructors of the ConversionExtensionsCatalog. Internalizing the constructors. #2118

Conversation

sfilipi commented Jan 11, 2019

sfilipi Jan 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ivanidzo4ka Jan 11, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ivanidzo4ka Jan 11, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ivanidzo4ka commented Jan 11, 2019 • edited by sfilipi Loading

TomFinley Jan 11, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

Ivanidzo4ka Jan 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomFinley Jan 11, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

sfilipi commented Jan 13, 2019 • edited Loading

sfilipi commented Jan 13, 2019

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley commented Jan 15, 2019 • edited by sfilipi Loading

TomFinley commented Jan 15, 2019 • edited by sfilipi Loading

TomFinley commented Jan 15, 2019

TomFinley commented Jan 15, 2019 • edited by sfilipi Loading

TomFinley commented Jan 15, 2019 • edited Loading

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomFinley Jan 15, 2019 • edited by sfilipi Loading

sfilipi Jan 11, 2019 •

edited

Loading

Ivanidzo4ka Jan 11, 2019 •

edited by sfilipi

Loading

Ivanidzo4ka Jan 11, 2019 •

edited by sfilipi

Loading

Ivanidzo4ka commented Jan 11, 2019 •

edited by sfilipi

Loading

TomFinley Jan 11, 2019 •

edited by sfilipi

Loading

Ivanidzo4ka Jan 11, 2019 •

edited

Loading

TomFinley Jan 11, 2019 •

edited by sfilipi

Loading

sfilipi commented Jan 13, 2019 •

edited

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley commented Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley commented Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley commented Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley commented Jan 15, 2019 •

edited

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley Jan 15, 2019 •

edited by sfilipi

Loading

TomFinley commented Jan 15, 2019 •

edited by sfilipi

Loading

sfilipi commented Jan 18, 2019 •

edited

Loading