-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Initial replacement of SubComponent with IComponentFactory #622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
cf7207f
0dec10c
a0a5da2
d702353
05141a8
5fa843e
12f1798
69c55c9
4cd6a29
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,7 @@ | |
using Microsoft.ML.Runtime.Command; | ||
using Microsoft.ML.Runtime.CommandLine; | ||
using Microsoft.ML.Runtime.Data; | ||
using Microsoft.ML.Runtime.EntryPoints; | ||
using Microsoft.ML.Runtime.Internal.Calibration; | ||
using Microsoft.ML.Runtime.Internal.Utilities; | ||
|
||
|
@@ -69,8 +70,8 @@ public sealed class Arguments : DataCommand.ArgumentsBase | |
[Argument(ArgumentType.LastOccurenceWins, HelpText = "Whether we should cache input training data", ShortName = "cache")] | ||
public bool? CacheData; | ||
|
||
[Argument(ArgumentType.Multiple, HelpText = "Transforms to apply prior to splitting the data into folds", ShortName = "prexf")] | ||
public KeyValuePair<string, SubComponent<IDataTransform, SignatureDataTransform>>[] PreTransform; | ||
[Argument(ArgumentType.Multiple, HelpText = "Transforms to apply prior to splitting the data into folds", ShortName = "prexf", SignatureType = typeof(SignatureDataTransform))] | ||
public KeyValuePair<string, IComponentFactory<IDataView, IDataTransform>>[] PreTransform; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not necessarily how I'd prefer to do these things, for two reasons. Firstly, while Also: our goal would be to also get rid of this little "signature" concept. Signature is used for two major purposes, which I list here and describe how I think they could be better accomplished.
I would have an actual subinterface type (that is, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I 100% agree. But it is too baked into the system right now to remove it in a single commit. Moving to
I'd need to think about this a bit more. My initial reaction is that this would cause the following disadvantages:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /cc @Zruty0 - any thoughts here? According to history you were the original author #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @[email protected] 's reasoning. I think we could check it in as is. In reply to: 207046122 [](ancestors = 207046122) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
[Argument(ArgumentType.AtMostOnce, IsInputFileName = true, HelpText = "The validation data file", ShortName = "valid")] | ||
public string ValidationFile; | ||
|
@@ -159,16 +160,18 @@ private void RunCore(IChannel ch, string cmd) | |
string name = TrainUtils.MatchNameOrDefaultOrNull(ch, loader.Schema, nameof(Args.NameColumn), Args.NameColumn, DefaultColumnNames.Name); | ||
if (name == null) | ||
{ | ||
var args = new GenerateNumberTransform.Arguments(); | ||
args.Column = new[] { new GenerateNumberTransform.Column() { Name = DefaultColumnNames.Name }, }; | ||
args.UseCounter = true; | ||
var options = CmdParser.GetSettings(ch, args, new GenerateNumberTransform.Arguments()); | ||
preXf = preXf.Concat( | ||
new[] | ||
{ | ||
new KeyValuePair<string, SubComponent<IDataTransform, SignatureDataTransform>>( | ||
"", new SubComponent<IDataTransform, SignatureDataTransform>( | ||
GenerateNumberTransform.LoadName, options)) | ||
new KeyValuePair<string, IComponentFactory<IDataView, IDataTransform>>( | ||
"", new SimpleComponentFactory<IDataView, IDataTransform>( | ||
(env, input) => | ||
{ | ||
var args = new GenerateNumberTransform.Arguments(); | ||
args.Column = new[] { new GenerateNumberTransform.Column() { Name = DefaultColumnNames.Name }, }; | ||
args.UseCounter = true; | ||
return new GenerateNumberTransform(env, args, input); | ||
})) | ||
}).ToArray(); | ||
} | ||
} | ||
|
@@ -263,7 +266,7 @@ private RoleMappedData ApplyAllTransformsToData(IHostEnvironment env, IChannel c | |
private RoleMappedData CreateRoleMappedData(IHostEnvironment env, IChannel ch, IDataView data, ITrainer trainer) | ||
{ | ||
foreach (var kvp in Args.Transform) | ||
data = kvp.Value.CreateInstance(env, data); | ||
data = kvp.Value.CreateComponent(env, data); | ||
|
||
var schema = data.Schema; | ||
string label = TrainUtils.MatchNameOrDefaultOrNull(ch, schema, nameof(Args.LabelColumn), Args.LabelColumn, DefaultColumnNames.Label); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it have to be here? Let me see why it would be needed #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, I see now. You need to tell the arg parser what's the signature type to invoke the ctor properly.
Hmm, this is unfortunate. Are you sure there's no better way?
In reply to: 208054268 [](ancestors = 208054268)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found one yet...
We could have some sort of mapping from
IComponentFactory<...>
toSignatureType
. But that would seem fragile to me.My other thinking (that seems to be aligned with @TomFinley's above) is that we should be moving away from
SignatureTypes
. This at least puts the Signature type into the attribute, and out of the API directly. Hopefully, eventually we can remove signature types all together. #PendingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the fact that it's prominent in the attribute makes the whole approach less desirable. But I think it's an OK intermediate step
In reply to: 208058930 [](ancestors = 208058930)