-
Notifications
You must be signed in to change notification settings - Fork 1.9k
swapping the order or arguments on the constructors of the ConversionExtensionsCatalog. Internalizing the constructors. #2118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -39,7 +39,7 @@ public void TensorFlowTransforCifarEndToEndTest() | |||
.Append(new ImagePixelExtractingEstimator(mlContext, "ImageCropped", "Input", interleave: true)) | |||
.Append(new TensorFlowEstimator(mlContext, model_location, new[] { "Input" }, new[] { "Output" })) | |||
.Append(new ColumnConcatenatingEstimator(mlContext, "Features", "Output")) | |||
.Append(new ValueToKeyMappingEstimator(mlContext, "Label")) | |||
.Append(new ValueToKeyMappingEstimator(mlContext, "Label", "Label")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, "Label" [](start = 77, length = 9)
side effect of this change is that we cannot have the output column default to null, and therefore substitute it with the input column when not provided... #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's reverse that. We should retain the behavior and attitude that existed before the fiasco began, of the "name" (what was misidentified as "output") always being specified, and in the case where there is a single "source" (what was misidentified as "input"), it is taken as being the same as the "name."
In reply to: 246970621 [](ancestors = 246970621)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was marked as resolved, but I still see that we've retained this change.
In reply to: 246974415 [](ancestors = 246974415,246970621)
/// <param name="hashBits">Number of bits to hash into. Must be between 1 and 31, inclusive.</param> | ||
/// <param name="invertHash">During hashing we constuct mappings between original values and the produced hash values. | ||
/// Text representation of original values are stored in the slot names of the metadata for the new column.Hashing, as such, can map many initial values to one. | ||
/// <paramref name="invertHash"/> specifies the upper bound of the number of distinct input values mapping to a hash that should be retained. | ||
/// <value>0</value> does not retain any input values. <value>-1</value> retains all input values mapping to each hash.</param> | ||
public HashingEstimator(IHostEnvironment env, string inputColumn, string outputColumn = null, | ||
internal HashingEstimator(IHostEnvironment env, string inputColumn, string outputColumn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal [](start = 8, length = 8)
why estimator constructor becomes internal? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -92,24 +92,24 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co | |||
/// <param name="outputColumn">The name of the output column.</param> | |||
/// <param name="bag">Whether bagging is used for the conversion. </param> | |||
public static KeyToVectorMappingEstimator MapKeyToVector(this TransformsCatalog.ConversionTransforms catalog, | |||
string inputColumn, string outputColumn = null, bool bag = KeyToVectorMappingEstimator.Defaults.Bag) | |||
=> new KeyToVectorMappingEstimator(CatalogUtils.GetEnvironment(catalog), inputColumn, outputColumn, bag); | |||
string outputColumn, string inputColumn, bool bag = KeyToVectorMappingEstimator.Defaults.Bag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputColumn [](start = 40, length = 11)
I'm not a Tom, and I can't speak from his name, but I don't remember him objecting on idea of one of column be optional, and I thinks it's perfectly fine to have second column nullable and replace it with first one. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this constructor and constructor for estimator which accepts term column and loader should exist. (you can just remove term column and loader) Refers to: src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs:127 in c6961d9. [](commit_id = c6961d9, deletion_comment = False) |
: this(env, new KeyToVectorMappingTransformer(env, columns)) | ||
{ | ||
} | ||
|
||
public KeyToVectorMappingEstimator(IHostEnvironment env, string inputColumn, string outputColumn = null, bool bag = Defaults.Bag) | ||
: this(env, new KeyToVectorMappingTransformer(env, new KeyToVectorMappingTransformer.ColumnInfo(inputColumn, outputColumn ?? inputColumn, bag))) | ||
internal KeyToVectorMappingEstimator(IHostEnvironment env, string outputColumn, string inputColumn, bool bag = Defaults.Bag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputColumn, string inputColumn [](start = 74, length = 32)
outputColumn, string inputColumn [](start = 74, length = 32)
Thank you @sfilipi! It was in a note somewhat down on the issue, but in a reply to @@justinormont where we were talking about consistency with existing argument names, I also observed that the names "inputs" and "outputs" are incorrect (since the inputs to outputs to estimators/transforms are not columns, but IDataView
s), and should retain something closer the existing names we have for this name
and source
. Indeed, I think the bad naming was part of the reason some people got so badly confused in the first place. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But these ones are InputColumn
and OutputColumn
and estimator operate over columns in IDataView
. What is wrong with them? And why name
and source
and not destination
and source
? name
is such a broad term to use it.
In reply to: 246974093 [](ancestors = 246974093)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope i am not annoying and further fueling the conversation, but sharing my personal opinion:
1- I agree that the order of arguments should be swapped, and have destination, than source.
2- I find the name "name" for the output column confusing too. Yes it does make sense in the scenario, but you have to blink about 4 times before it does so; just because, as Ivan says, it is such a generic thing.
The names InputColumn, and OutputColumn make a lot more sense to me too, since they are actually column names..
The end!
In reply to: 246974845 [](ancestors = 246974845,246974093)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term name
and source
is a well established name in this codebase. The concept of the input and output for a transform is already being established as being the dataviews themselves. Transforms do not take columns as inputs and outputs, they take data views as inputs and outputs -- that's obvious from the interface specification. So: let's jsut agree, the dataviews are the inputs and outputs, full stop. In the case where a transform happens to be producing an additional column, we have parameterized the transform as producing a column with that "name." And when that parameterization also happens to pay attention to one column, we often call that a "source." Always.
It's interesting that you have a different interpretation, and I'd like to emphasize that I don't discount it, but it is distinct from how we've architected the old IDataTransform
and I don't like the new interpretation for the reasons already discussed at length.
Going forward, I think this shows how dangerous it is to introduce a new interpretation without discussing that new interpretation with anyone, discussing its merits, etc. I'm sorry that was done, but not sorry that we're fixing it.
In reply to: 247204046 [](ancestors = 247204046,246974845,246974093)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transforms do not take columns as inputs and outputs, they take data views as inputs and outputs
Estimators during it construction doesn't take any dataview, they take columns which expected to be in dataview. We don't touch dataview till we call Fit. We do schema propagation for GetOutputSchema
on SchemaShape.Column
,
Only thing which take data views as inputs as far as I know is AppendDataViewRows
transform.
The term name and source is a well established name in this codebase.
Internally, yes, mostly as legacy for command line. Legacy doesn't mean it's a great and useful for another task (Public Api).
let's jsut agree, the dataviews are the inputs and outputs, full stop.
I'm as user, don't see input columns and output columns in estimators as input and output dataviews. I think @sfilipi has the same opinion. I can't even come up with estimator/transformer which would produce multiple dataviews (maybe evaluator with metrics as dataviews)
So far we have two people who think name
and source
is not a great idea (especially name
)
We do it not because we want to make you mad, we do it because we also care about project.
without discussing that new interpretation with anyone
Can we actually have discussion regarding name
and source
vs inputColumn
and outputColumn
? I would love to hear other people opinion, don't you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Estimators during it construction doesn't take any dataview, they take columns which expected to be in dataview. We don't touch dataview till we call Fit.
If you read what I wrote, you'll see I was explicitly talking about transformers, but estimators are similar. They don't take inputs and outputs as columns. They take inputs as dataviews (as you say), and they the output of that is a transformer. This, again, already covered in #2064.
We do schema propagation for GetOutputSchema on SchemaShape.Column,
You're mistaken. Check the IEstimator
interface again.
SchemaShape GetOutputSchema(SchemaShape inputSchema); |
As you see, again, inputs and outputs to this process are not columnar, they in fact are more global than this.
Internally, yes, mostly as legacy for command line. Legacy doesn't mean it's a great and useful for another task (Public Api).
Already covered in #2064... to summarize, you must justify inconsistencies. We should try to minimize inconsistencies. Just pointing out that sometimes we've judged it healthy that the two should be unaligned because they're in different settings is not a free pass to introduce whatever inconsistencies you want. Not only was this change unjustified, it has the flaws that I have identified.
Can we actually have discussion regarding name and source vs inputColumn and outputColumn ? I would love to hear other people opinion, don't you?
At some point, as again established in #2064, one of my responsibilities (which, just to emphasize, I do not enjoy) is to judge when discussions cease to become productive, so that we can actually move on with the work that must be done. While I'd love to have the luxury of enough free time to just endlessly circle around things like this, we can't. Let's move on.
In reply to: 248024251 [](ancestors = 248024251)
public ValueToKeyMappingEstimator(IHostEnvironment env, string inputColumn, string outputColumn = null, int maxNumTerms = Defaults.MaxNumTerms, ValueToKeyMappingTransformer.SortOrder sort = Defaults.Sort) : | ||
this(env, new [] { new ValueToKeyMappingTransformer.ColumnInfo(inputColumn, outputColumn ?? inputColumn, maxNumTerms, sort) }) | ||
internal ValueToKeyMappingEstimator(IHostEnvironment env, string outputColumn, string inputColumn, int maxNumTerms = Defaults.MaxNumTerms, ValueToKeyMappingTransformer.SortOrder sort = Defaults.Sort) : | ||
this(env, new [] { new ValueToKeyMappingTransformer.ColumnInfo(inputColumn, outputColumn, maxNumTerms, sort) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputColumn, outputColumn [](start = 74, length = 25)
It would be well to make sure we're consistent everywhere, and also swap these. #Resolved
I will fix this in the next PR here, because this goes all the way down to the OneToOneTransformerBase, and that change will require touching everything .. #Pending Refers to: src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs:75 in c6961d9. [](commit_id = c6961d9, deletion_comment = False) |
Left the ctor, because it is used through the codebase; it;s internal. Removed the reference here. In reply to: 453322670 [](ancestors = 453322670) Refers to: src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs:127 in c6961d9. [](commit_id = c6961d9, deletion_comment = False) |
c6961d9
to
0064b05
Compare
/// <param name="input">Name of input column.</param> | ||
/// <param name="output">Name of the column resulting from the transformation of <paramref name="input"/>. Null means <paramref name="input"/> is replaced.</param> | ||
/// <param name="source">Name of column to work on.</param> | ||
/// <param name="name">Name of the column resulting from the transformation of <paramref name="source"/>. Null means <paramref name="source"/> is replaced.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Null means is replaced [](start = 118, length = 48)
Other way around. null
on source means this. Also FYI, while I'm pleased the see that the parameters themselves are in the right order, the params
tags themselves are not reversed. #Resolved
public readonly string Input; | ||
public readonly string Output; | ||
public readonly string Source; | ||
public readonly string Name; | ||
public readonly int HashBits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're at it, might want to put these in the right order. #Resolved
public readonly DataKind OutputKind; | ||
public readonly KeyRange OutputKeyRange; | ||
|
||
/// <summary> | ||
/// Describes how the transformer handles one column pair. | ||
/// </summary> | ||
/// <param name="input">Name of input column.</param> | ||
/// <param name="output">Name of output column.</param> | ||
/// <param name="source">Name of input column.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name of input column [](start = 37, length = 20)
Similar comment to other file about ordering of tags, description, etc. #Resolved
/// <param name="outputKind">The expected type of the converted column.</param> | ||
/// <param name="outputKeyRange">New key range if we work with key type.</param> | ||
public TypeConvertingTransformer(IHostEnvironment env, string inputColumn, string outputColumn, DataKind outputKind, KeyRange outputKeyRange = null) | ||
: this(env, new ColumnInfo(inputColumn, outputColumn, outputKind, outputKeyRange)) | ||
public TypeConvertingTransformer(IHostEnvironment env, string source, string name, DataKind outputKind, KeyRange outputKeyRange = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string source, string name [](start = 63, length = 26)
This is not in proper order. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will tackle the transformers in the next pr, because i have to change all of them, they all derive from OneToOneTransformer.
Agree, it is confusing because I did partial work, on renaming the args.
In reply to: 248078355 [](ancestors = 248078355)
@@ -141,7 +130,7 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co | |||
this TransformsCatalog.ConversionTransforms catalog, | |||
IEnumerable<TInputType> keys, | |||
IEnumerable<TOutputType> values, | |||
params (string source, string name)[] columns) | |||
params (string inputColumn, string outputColumn)[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputColumn, string outputColumn [](start = 27, length = 32)
Remember, name
, source
, not source
, name
. And definitely not input
, output
. :) #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think for conistency sake we want the column mapping to be at the head, even when we have to forgo the convenience of the params
.
In reply to: 248079402 [](ancestors = 248079402)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this on the next PR, because the tuple is passed directly from the estimator -> transfomer-> OneToOnetransformer base class.
I agree that it is confusing to review, because i did partial work with renaming the tuple items.
In reply to: 248079667 [](ancestors = 248079667,248079402)
/// <param name="hashBits">Number of bits to hash into. Must be between 1 and 31, inclusive.</param> | ||
/// <param name="invertHash">During hashing we constuct mappings between original values and the produced hash values. | ||
/// Text representation of original values are stored in the slot names of the metadata for the new column.Hashing, as such, can map many initial values to one. | ||
/// <paramref name="invertHash"/> specifies the upper bound of the number of distinct input values mapping to a hash that should be retained. | ||
/// <value>0</value> does not retain any input values. <value>-1</value> retains all input values mapping to each hash.</param> | ||
public HashingEstimator(IHostEnvironment env, string inputColumn, string outputColumn = null, | ||
internal HashingEstimator(IHostEnvironment env, string outputColumn, string inputColumn = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputColumn [](start = 63, length = 12)
Another remnant of the bad input/output names. Also another case where the XML comments should be made less awkward, as noted elsewhere. #Resolved
: this(env, new KeyToVectorMappingTransformer(env, columns)) | ||
{ | ||
} | ||
|
||
public KeyToVectorMappingEstimator(IHostEnvironment env, string inputColumn, string outputColumn = null, bool bag = Defaults.Bag) | ||
: this(env, new KeyToVectorMappingTransformer(env, new KeyToVectorMappingTransformer.ColumnInfo(inputColumn, outputColumn ?? inputColumn, bag))) | ||
internal KeyToVectorMappingEstimator(IHostEnvironment env, string outputColumn, string inputColumn = null, bool bag = Defaults.Bag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputColumn, string inputColumn [](start = 74, length = 32)
Another name
, source
... #Resolved
public TypeConvertingEstimator(IHostEnvironment env, | ||
string inputColumn, string outputColumn = null, | ||
internal TypeConvertingEstimator(IHostEnvironment env, | ||
string outputColumn, string inputColumn = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputColumn, string inputColumn [](start = 19, length = 32)
Here's another. I'm just going to stop commenting on these, maybe self-review, double check them? #Resolved
/// <param name="inputColumn">Name of the column to be transformed.</param> | ||
/// <param name="outputColumn">Name of the output column. If this is null '<paramref name="inputColumn"/>' will be used.</param> | ||
/// <param name="source">Name of the column to be transformed.</param> | ||
/// <param name="name">Name of the column produced.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Name of the column produced. [](start = 8, length = 59)
Reverse <param
tags. #Resolved
/// <param name="inputColumn">Name of the column to be transformed.</param> | ||
/// <param name="outputColumn">Name of the output column. If this is null '<paramref name="inputColumn"/>' will be used.</param> | ||
/// <param name="source">Name of the column to be transformed.</param> | ||
/// <param name="name">Name of the column produced.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name of the column produced [](start = 31, length = 27)
To my eyes, this phrasing is awkward. What was wrong with the existing descriptors we already used for this in existing tooling?
public abstract class SourceNameColumnBase | |
{ | |
[Argument(ArgumentType.AtMostOnce, HelpText = "Name of the new column", ShortName = "name")] | |
public string Name; | |
[Argument(ArgumentType.AtMostOnce, HelpText = "Name of the source column", ShortName = "src")] | |
public string Source; |
Maybe use that, with appropriate clarifications about what happens when source
is null
, and so on. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This BTW is a general comment, I see lots of inconsistency in our documentation. Including descriptions of these columns as somehow being "inputs" and "outputs" which, as we've seen direct examples of, leads people to think about these structures in the wrong way.
In reply to: 248087222 [](ancestors = 248087222)
@@ -43,7 +43,7 @@ public void CV_Multiclass_Digits_RffTransform_OVAAveragedPerceptron() | |||
var pipeline = mlContext.Transforms.Projection.CreateRandomFourierFeatures("Features", "FeaturesRFF") | |||
.AppendCacheCheckpoint(mlContext) | |||
.Append(mlContext.Transforms.Concatenate("Features", "FeaturesRFF")) | |||
.Append(new ValueToKeyMappingEstimator(mlContext, "Label")) | |||
.Append(new ValueToKeyMappingEstimator(mlContext, "Label", "Label")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, "Label" [](start = 69, length = 9)
Now that the source
column has a default value again, from what I see, we can get rid of this, right? #Resolved
@@ -27,7 +27,7 @@ public void Metacomponents() | |||
var sdcaTrainer = ml.BinaryClassification.Trainers.StochasticDualCoordinateAscent("Label", "Features", advancedSettings: (s) => { s.MaxIterations = 100; s.Shuffle = true; s.NumThreads = 1; }); | |||
|
|||
var pipeline = new ColumnConcatenatingEstimator (ml, "Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth") | |||
.Append(new ValueToKeyMappingEstimator(ml, "Label"), TransformerScope.TrainTest) | |||
.Append(new ValueToKeyMappingEstimator(ml, "Label", "Label"), TransformerScope.TrainTest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, "Label" [](start = 66, length = 9)
Similar here. Unneeded, or should be. #Resolved
@@ -39,7 +39,7 @@ void Extensibility() | |||
}; | |||
var pipeline = new ColumnConcatenatingEstimator (ml, "Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth") | |||
.Append(new CustomMappingEstimator<IrisData, IrisData>(ml, action, null), TransformerScope.TrainTest) | |||
.Append(new ValueToKeyMappingEstimator(ml, "Label"), TransformerScope.TrainTest) | |||
.Append(new ValueToKeyMappingEstimator(ml, "Label", "Label"), TransformerScope.TrainTest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, "Label" [](start = 66, length = 9)
Same, unneeded. #Resolved
@@ -30,7 +30,7 @@ void DecomposableTrainAndPredict() | |||
var data = ml.Data.ReadFromTextFile<IrisData>(dataPath, separatorChar: ','); | |||
|
|||
var pipeline = new ColumnConcatenatingEstimator (ml, "Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth") | |||
.Append(new ValueToKeyMappingEstimator(ml, "Label"), TransformerScope.TrainTest) | |||
.Append(new ValueToKeyMappingEstimator(ml, "Label", "Label"), TransformerScope.TrainTest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, "Label" [](start = 66, length = 9)
Same, unneeded. #Resolved
While less essential since this is a field in an method that is not even privately visible, we've seen the mischief caused by this conflation of "input" and "output" with things that are neither inputs nor otuputs, and the ordering has all sorts of confusing far reaching consequences by confusing some people. Consider making this consistent. #Pending Refers to: src/Microsoft.ML.Transforms/OneHotHashEncoding.cs:273 in 520cb79. [](commit_id = 520cb79, deletion_comment = False) |
Obviously far less essential than getting the public surface area right. In reply to: 454578490 [](ancestors = 454578490) Refers to: src/Microsoft.ML.Transforms/OneHotHashEncoding.cs:273 in 520cb79. [](commit_id = 520cb79, deletion_comment = False) |
Hmmm. I wonder if I indirectly bear some fault for the confusion, since in my static API, I do consider the operations to be on columns, as opposed to Refers to: src/Microsoft.ML.StaticPipe/TransformsStatic.cs:578 in 520cb79. [](commit_id = 520cb79, deletion_comment = False) |
public readonly string Input; | ||
public readonly string Output; | ||
public readonly string Source; | ||
public readonly string Name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name first. #Resolved
@@ -114,10 +114,10 @@ public ColumnInfo(string input, string output = null, bool bag = KeyToVectorMapp | |||
public IReadOnlyCollection<ColumnInfo> Columns => _columns.AsReadOnly(); | |||
private readonly ColumnInfo[] _columns; | |||
|
|||
private static (string input, string output)[] GetColumnPairs(ColumnInfo[] columns) | |||
private static (string source, string name)[] GetColumnPairs(ColumnInfo[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source, string name [](start = 31, length = 19)
Probably best to make consistent as we're trying to do elsewhere, even if it is private. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address it in my next PR, because they get used in the Transformer, which passes it to the base class OneToOnetransformer. Once that changes everything needs to change.
In reply to: 248091617 [](ancestors = 248091617)
@@ -31,16 +31,16 @@ public static void KeyToValue_Term() | |||
// making use of default settings. | |||
string defaultColumnName = "DefaultKeys"; | |||
// REVIEW create through the catalog extension | |||
var default_pipeline = new WordTokenizingEstimator(ml, "Review") | |||
.Append(new ValueToKeyMappingEstimator(ml, "Review", defaultColumnName)); | |||
var default_pipeline = mlContext.Transforms.Text.TokenizeWords("Review") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_pipeline [](start = 16, length = 16)
Ugh. Our samples have been written with Python variable naming conventions, not C# ones? I'll have to file an issue about this... #2155. #WontFix
@@ -76,7 +76,7 @@ public void MetacomponentsFeaturesRenamed() | |||
|
|||
var sdcaTrainer = new SdcaBinaryTrainer(Env, "Label", "Vars", advancedSettings: (s) => { s.MaxIterations = 100; s.Shuffle = true; s.NumThreads = 1; }); | |||
var pipeline = new ColumnConcatenatingEstimator(Env, "Vars", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth") | |||
.Append(new ValueToKeyMappingEstimator(Env, "Label"), TransformerScope.TrainTest) | |||
.Append(new ValueToKeyMappingEstimator(Env, "Label", "Label"), TransformerScope.TrainTest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, "Label" [](start = 67, length = 9)
Another unnecessary. #Resolved
@@ -193,7 +193,7 @@ private TextLoader.Arguments GetIrisLoaderArgs() | |||
} | |||
}).Read(GetDataPath(IrisDataPath)); | |||
|
|||
var pipeline = new ValueToKeyMappingEstimator(Env, "Label"); | |||
var pipeline = new ValueToKeyMappingEstimator(Env, "Label", "Label"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, "Label" [](start = 70, length = 9)
Another unnecessary. #Resolved
@@ -249,7 +249,7 @@ public void NgramWorkout() | |||
.Read(sentimentDataPath); | |||
|
|||
var est = new WordTokenizingEstimator(Env, "text", "text") | |||
.Append(new ValueToKeyMappingEstimator(Env, "text", "terms")) | |||
.Append(new ValueToKeyMappingEstimator(Env, "terms", "text")) | |||
.Append(new NgramExtractingEstimator(Env, "terms", "ngrams")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NgramExtractingEstimator [](start = 28, length = 24)
Hmmm. This didn't get fixed? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not converting the NGramExtracting Estimator on this PR. It will come in the next PR.
In reply to: 248095484 [](ancestors = 248095484)
@@ -249,7 +249,7 @@ public void NgramWorkout() | |||
.Read(sentimentDataPath); | |||
|
|||
var est = new WordTokenizingEstimator(Env, "text", "text") | |||
.Append(new ValueToKeyMappingEstimator(Env, "text", "terms")) | |||
.Append(new ValueToKeyMappingEstimator(Env, "terms", "text")) | |||
.Append(new NgramExtractingEstimator(Env, "terms", "ngrams")) | |||
.Append(new NgramHashingEstimator(Env, "terms", "ngramshash")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NgramHashingEstimator [](start = 28, length = 21)
Nor this? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not converting the NGramHashing Estimator on this PR. It will come in the next PR.
In reply to: 248095509 [](ancestors = 248095509)
I did write a comment that i am not swapping for the Transformers on this PR. the column info is shared between the estimators and the transformers. The reason I am not swapping for the transformers, is because they all pass the columns to base - OneToOneTransformer; and if you swap them in there, you have to do all the transforms. This was meant to be a small PR to just test understanding of the changes, reach agreement. I am tackling all of them in the next PR. In reply to: 454584468 [](ancestors = 454584468) Refers to: src/Microsoft.ML.Transforms/OneHotEncoding.cs:192 in 520cb79. [](commit_id = 520cb79, deletion_comment = False) |
I am not doing all the estimators in this PR; just the one for the ConversionsExtensionsCatalog. The reason this file is here, is because of the changes in KeyToVector. In reply to: 454578807 [](ancestors = 454578807) Refers to: src/Microsoft.ML.Transforms/OneHotEncoding.cs:218 in 520cb79. [](commit_id = 520cb79, deletion_comment = False) |
…xtensionsCatalog. Internalizing the constructors.
I will do it in the next PR, where the order gets swapped on the OneHotEncodingTransformer. This PR has not addressed that component. In reply to: 454578613 [](ancestors = 454578613,454578490) Refers to: src/Microsoft.ML.Transforms/OneHotHashEncoding.cs:273 in 520cb79. [](commit_id = 520cb79, deletion_comment = False) |
520cb79
to
60ab6cd
Compare
@glebuk, @TomFinley, @Ivanidzo4ka would you have time for a second look? Thanks. |
Closing this since the work got also done in: #2239 |
Working towards resolving #2064 by swapping the order or arguments on the constructors of the ConversionExtensions Catalog. Internalizing the constructors.