Skip to content

Replace all ML.Transforms SubComponent usages with IComponentFactory. #700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 24, 2018

Conversation

eerhardt
Copy link
Member

Working towards #585

Please ignore the first commit, it is being proposed separately as #698. I am using these new constructors in this PR, so I needed to include it. If you have any comments regarding it directly, please put them on #698.

There are 2 "hacks" in this PR that I'm not proud of:

  1. LearnerFeatureSelectionTransform depends on SDCA, but ML.Transforms doesn't have a reference to ML.StandardLearners. I wasn't sure how to proceed here (add the dependency, or move some code around). For now I continued to use Dependency Injection to create the component. Please give me your opinion on what the best approach forward would be.

  2. RffTransform has an undesirable coupling to which kind of MatrixGenerator it is using (gaussian or not). Previously, it was using the ComponentCatalog to determine which type of MatrixGenerator it was working with before actually creating it. However, I can no longer do that without actually creating the generator, so I needed to make a "dummy" instance. I spoke with @yaeldekel, and we decided this was "OK" for now, since it typically only used with a small number of columns (i.e. 1). I've logged RffTransform has undesirable coupling to its MatrixGenerator #699 for this.

Creating a TextLoader and specifying its arguments manually is too verbose. We should add a few constructor overloads to make it easier.
@@ -9,6 +9,7 @@
using Microsoft.ML.Runtime.Data;
using Microsoft.ML.Runtime.Internal.Utilities;
using Microsoft.ML.Runtime.Internal.Internallearn;
using Microsoft.ML.Runtime.EntryPoints;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorting #Resolved

[Argument(ArgumentType.Multiple, HelpText = "Filter", ShortName = "f", SortOrder = 1, SignatureType = typeof(SignatureFeatureScorerTrainer))]
public IComponentFactory<ITrainer<IPredictorWithFeatureWeights<Single>>> Filter =
ComponentFactoryUtils.CreateFromFunction(env =>
// ML.Transforms doesn't have a direct reference to ML.StandardLearners, so use ComponentCatalog to create the Filter
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// ML.Transforms doesn't have a direct reference to ML.StandardLearners, [](start = 20, length = 72)

since it's a learner feature selection, maybe make sense to move it to learners project? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #726 for this discussion. We can decide what to do in that issue. For now, keeping it to use DI (like it was before) is a decent approach.


In reply to: 211692955 [](ancestors = 211692955)

[Argument(ArgumentType.Multiple, HelpText = "which kernel to use?", ShortName = "kernel")]
public SubComponent<IFourierDistributionSampler, SignatureFourierDistributionSampler> MatrixGenerator =
new SubComponent<IFourierDistributionSampler, SignatureFourierDistributionSampler>(GaussianFourierSampler.LoadName);
[Argument(ArgumentType.Multiple, HelpText = "which kernel to use?", ShortName = "kernel", SignatureType = typeof(SignatureFourierDistributionSampler))]
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w [](start = 57, length = 1)

Not your fault, but since you touch it, can we capitalize it? Here and in for UseSin field? #Resolved

sub = args.MatrixGenerator;
var info = ComponentCatalog.GetLoadableClassInfo(sub);
bool gaussian = info != null && info.Type == typeof(GaussianFourierSampler);
var matrixGenerator = sub.CreateComponent(host, 1);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 [](start = 68, length = 1)

Is it current default value?
Can we somehow get actual default value from argument class?
It's really easy to forget change this line if we decide to update default value in matrix generator. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a "dummy" instance in order to get the type of matrix generator we are using. See #699.

I'll add a comment.


In reply to: 211694368 [](ancestors = 211694368)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Separator = "tab",
Column = new[]
{
new TextLoader.Column("Stopwords", DataKind.TX, 0)
Copy link
Member

@codemzs codemzs Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Stopwords" [](start = 54, length = 11)

Not in the scope of this change but I feel these names should be configurable from arguments instead of hardcoding where it might hide an existing column. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #725 for this.


In reply to: 212152727 [](ancestors = 212152727)

else
{
loader = new TransposeLoader(host, new TransposeLoader.Arguments(), fileSource);
}
Copy link
Member

@codemzs codemzs Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loader = isBinary ? new BinaryLoader(host, new BinaryLoader.Arguments(), fileSource) : new TransposeLoader(host, new TransposeLoader.Arguments(), fileSource)

That is generally the pattern in the code. #WontFix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that code doesn't compile, and I'm not sure why.

Severity Code Description Project File Line Suppression State
Error CS0173 Type of conditional expression cannot be determined because there is no implicit conversion between 'Microsoft.ML.Runtime.Data.IO.BinaryLoader' and 'Microsoft.ML.Runtime.Data.IO.TransposeLoader' Microsoft.ML.Transforms F:\git\machinelearning\src\Microsoft.ML.Transforms\TermLookupTransform.cs 549 Active

In reply to: 212153383 [](ancestors = 212153383)

new TextLoader.Column("SepalWidth", DataKind.R4, 1),
new TextLoader.Column("PetalLength", DataKind.R4, 2),
new TextLoader.Column("PetalWidth",DataKind.R4, 3),
new TextLoader.Column("Label", DataKind.Text, 4)
Copy link
Member

@codemzs codemzs Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cleaner! #Resolved

Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@eerhardt eerhardt merged commit 841ba78 into dotnet:master Aug 24, 2018
@eerhardt eerhardt deleted the SubComponentTransforms branch November 7, 2018 14:56
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants