-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Replace all ML.Transforms SubComponent usages with IComponentFactory. #700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Creating a TextLoader and specifying its arguments manually is too verbose. We should add a few constructor overloads to make it easier.
@@ -9,6 +9,7 @@ | |||
using Microsoft.ML.Runtime.Data; | |||
using Microsoft.ML.Runtime.Internal.Utilities; | |||
using Microsoft.ML.Runtime.Internal.Internallearn; | |||
using Microsoft.ML.Runtime.EntryPoints; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorting #Resolved
[Argument(ArgumentType.Multiple, HelpText = "Filter", ShortName = "f", SortOrder = 1, SignatureType = typeof(SignatureFeatureScorerTrainer))] | ||
public IComponentFactory<ITrainer<IPredictorWithFeatureWeights<Single>>> Filter = | ||
ComponentFactoryUtils.CreateFromFunction(env => | ||
// ML.Transforms doesn't have a direct reference to ML.StandardLearners, so use ComponentCatalog to create the Filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ML.Transforms doesn't have a direct reference to ML.StandardLearners, [](start = 20, length = 72)
since it's a learner feature selection, maybe make sense to move it to learners project? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Argument(ArgumentType.Multiple, HelpText = "which kernel to use?", ShortName = "kernel")] | ||
public SubComponent<IFourierDistributionSampler, SignatureFourierDistributionSampler> MatrixGenerator = | ||
new SubComponent<IFourierDistributionSampler, SignatureFourierDistributionSampler>(GaussianFourierSampler.LoadName); | ||
[Argument(ArgumentType.Multiple, HelpText = "which kernel to use?", ShortName = "kernel", SignatureType = typeof(SignatureFourierDistributionSampler))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w [](start = 57, length = 1)
Not your fault, but since you touch it, can we capitalize it? Here and in for UseSin field? #Resolved
sub = args.MatrixGenerator; | ||
var info = ComponentCatalog.GetLoadableClassInfo(sub); | ||
bool gaussian = info != null && info.Type == typeof(GaussianFourierSampler); | ||
var matrixGenerator = sub.CreateComponent(host, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 [](start = 68, length = 1)
Is it current default value?
Can we somehow get actual default value from argument class?
It's really easy to forget change this line if we decide to update default value in matrix generator. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separator = "tab", | ||
Column = new[] | ||
{ | ||
new TextLoader.Column("Stopwords", DataKind.TX, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Stopwords" [](start = 54, length = 11)
Not in the scope of this change but I feel these names should be configurable from arguments instead of hardcoding where it might hide an existing column. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else | ||
{ | ||
loader = new TransposeLoader(host, new TransposeLoader.Arguments(), fileSource); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loader = isBinary ? new BinaryLoader(host, new BinaryLoader.Arguments(), fileSource) : new TransposeLoader(host, new TransposeLoader.Arguments(), fileSource)
That is generally the pattern in the code. #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately that code doesn't compile, and I'm not sure why.
Severity Code Description Project File Line Suppression State
Error CS0173 Type of conditional expression cannot be determined because there is no implicit conversion between 'Microsoft.ML.Runtime.Data.IO.BinaryLoader' and 'Microsoft.ML.Runtime.Data.IO.TransposeLoader' Microsoft.ML.Transforms F:\git\machinelearning\src\Microsoft.ML.Transforms\TermLookupTransform.cs 549 Active
In reply to: 212153383 [](ancestors = 212153383)
new TextLoader.Column("SepalWidth", DataKind.R4, 1), | ||
new TextLoader.Column("PetalLength", DataKind.R4, 2), | ||
new TextLoader.Column("PetalWidth",DataKind.R4, 3), | ||
new TextLoader.Column("Label", DataKind.Text, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks cleaner! #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working towards #585
Please ignore the first commit, it is being proposed separately as #698. I am using these new constructors in this PR, so I needed to include it. If you have any comments regarding it directly, please put them on #698.
There are 2 "hacks" in this PR that I'm not proud of:
LearnerFeatureSelectionTransform
depends onSDCA
, butML.Transforms
doesn't have a reference toML.StandardLearners
. I wasn't sure how to proceed here (add the dependency, or move some code around). For now I continued to use Dependency Injection to create the component. Please give me your opinion on what the best approach forward would be.RffTransform
has an undesirable coupling to which kind of MatrixGenerator it is using (gaussian or not). Previously, it was using the ComponentCatalog to determine which type of MatrixGenerator it was working with before actually creating it. However, I can no longer do that without actually creating the generator, so I needed to make a "dummy" instance. I spoke with @yaeldekel, and we decided this was "OK" for now, since it typically only used with a small number of columns (i.e. 1). I've logged RffTransform has undesirable coupling to its MatrixGenerator #699 for this.