Skip to content

Replace SubComponent with IComponentFactory in ML.Ensemble #681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 17, 2018

Conversation

eerhardt
Copy link
Member

The next round of SubComponent removal. Now the ML.Ensemble project is SubComponent free.

Working towards #585

Note: I had to move the Argument class's fields down to the concrete class because of the way we handle SignatureType. You can use a generic type in an attribute. However, this allowed me to remove the TSig type on the base classes, which is nice since the signature Type shouldn't be in the public API.

@@ -10,6 +10,7 @@
<ProjectReference Include="..\Microsoft.ML.Data\Microsoft.ML.Data.csproj" />
<ProjectReference Include="..\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" />
<ProjectReference Include="..\Microsoft.ML.Transforms\Microsoft.ML.Transforms.csproj" />
<ProjectReference Include="..\Microsoft.ML.FastTree\Microsoft.ML.FastTree.csproj" />
Copy link
Contributor

@TomFinley TomFinley Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 4, length = 84)

We need this due to the usage of FastTree as the default traniner for some ensemble classes. This could be fine, but I wonder if using something from StandardLearners might make a bit more sense, if we envision that someday FastTree will be factored into a separate nuget package. (Maybe we don't want that though, in which case this dependency is probably fine.) #Resolved

Copy link
Member Author

@eerhardt eerhardt Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage to this change is that it brings to light these dependencies. Before we had a weakly-typed dependency using strings and DI. So if we ever did refactor FastTree to be separate, we could have easily broken this dependency.

I've logged #682 for this issue. #Resolved

Copy link
Contributor

@TomFinley TomFinley Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eerhardt ... yes I expect we'll see some more of this "hidden dependency" becoming exposed as we move away from direct usage of subcomponents and dependency injection. I recall a similar issue in #446 with the normalizers being defined in Microsoft.ML.Transforms and used in Microsoft.ML.Data, despite the former project depending on the latter. 😄 #Resolved

@@ -19,7 +21,7 @@ namespace Microsoft.ML.Runtime.Ensemble.OutputCombiners
{
using TScalarPredictor = IPredictorProducing<Single>;

public sealed class RegressionStacking : BaseScalarStacking<SignatureRegressorTrainer>, IRegressionOutputCombiner, ICanSaveModel
Copy link
Contributor

@TomFinley TomFinley Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SignatureRegressorTrainer [](start = 64, length = 25)

What are we going to do about this sort of restriction? I had suggested previously using something like ITrainer<IRegressionPredictor> or something like this, but I'm not sure what you thought of that idea. #Resolved

Copy link
Member Author

@eerhardt eerhardt Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which restriction are you referring to? The Signature Type restriction, where we can't use typeof(TSig) in an attribute?

My hope is that some day we remove signature types all together. I haven't thought of/discovered a good replacement for them yet. Maybe an approach similar to ComponentKind which is just a string "Kind".

However, that still wouldn't help in this case, because we need a different signature/kind/etc for each concrete class, but we want to define the argument on the base class. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant just restricting the DI and API usage to only consider actual, in this case, regression trainers, which is one of the functions of the signatures. Currently this is done in a fairly half-hearted fashion using this IPredictorProducing<float> and IPredictorProducing<VBuffer<float>>, but if we got rid of this and replaced it with, say, IRegressionPredictor and IBinaryClassificationPredictor and IMulticlassPredictor (or something like that), then we could just have this be c omponent factory producing ITrainer<IRegressionPredictor> in this case. I think MEF should work well with that restriction as well, from a DI perspective.


In reply to: 210463209 [](ancestors = 210463209)

{
get { return BasePredictorType; }
set { BasePredictorType = value; }
}
Copy link
Contributor

@TomFinley TomFinley Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. It seems like this boilerplate is necessarily only because ArgumentAttribute cannot be applied to properties, so we have to declare this backing field, which is then exposed through the property. If that is so, maybe we ought to just allow it to be applied to properties and change the command line parsing code to handle properties as well as fields, so we can avoid this awkwardness. #Resolved

Copy link
Member Author

@eerhardt eerhardt Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, I changed this to internal, since Stacking doesn't allow external inheritors anyway. I also changed it to a method, since the base class only needs to retrieve the component factories. #Resolved

/// A class for creating a component with no extra parameters (other than an <see cref="IHostEnvironment"/>)
/// that simply wraps a delegate which creates the component.
/// </summary>
public class SimpleComponentFactory<TComponent> : IComponentFactory<TComponent>
Copy link
Contributor

@TomFinley TomFinley Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public class SimpleComponentFactory [](start = 4, length = 35)

This can be sealed I hope? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


In reply to: 210433599 [](ancestors = 210433599)

/// </summary>
public class SimpleComponentFactory<TComponent> : IComponentFactory<TComponent>
{
private Func<IHostEnvironment, TComponent> _factory;
Copy link
Contributor

@TomFinley TomFinley Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private Func<IHostEnvironment, TComponent> _factory; [](start = 8, length = 52)

readonly? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


In reply to: 210433611 [](ancestors = 210433611)

/// A class for creating a component with no extra parameters (other than an <see cref="IHostEnvironment"/>)
/// that simply wraps a delegate which creates the component.
/// </summary>
public class SimpleComponentFactory<TComponent> : IComponentFactory<TComponent>
Copy link
Contributor

@TomFinley TomFinley Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public class SimpleComponentFactory [](start = 4, length = 35)

I'm somewhat reluctant to expose more public types in addition to IComponentFactory -- maybe if we just had something simple like a static creation class that returned IComponentFactory, with things like:

public static class ComponentFactoryUtils {
    public static IComponentFactory<TComponent> CreateFromFunction(Func<IHostEnvironment, TComponent> factory) { ... }
    public static IComponentFactory<TArg1, TComponent> CreateFromFunction(Func<IHostEnvironment, TArg1, TComponent> factory) { ... }
    public static IComponentFactory<TArg1, TArg2, TComponent> CreateFromFunction(Func<IHostEnvironment, TArg1, TArg2, TComponent> factory) { ... }
}

Then we could hide the actual implementations of these.
#Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. It also simplifies the call sites, in some cases, since the types can be inferred. Fixed.


In reply to: 210434551 [](ancestors = 210434551)

@TomFinley
Copy link
Contributor

TomFinley commented Aug 15, 2018

{

As a side effect to my earlier comment, we could then get rid of this funny little thing. #Closed


Refers to: src/Microsoft.ML.Core/EntryPoints/ComponentFactory.cs:65 in 84119d3. [](commit_id = 84119d3, deletion_comment = False)


public SimpleComponentFactory(Func<IHostEnvironment, TArg1, TComponent> factory)
/// <summary>
/// A utility class for creating IComponentFactory instances.
Copy link
Contributor

@TomFinley TomFinley Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IComponentFactory [](start = 37, length = 17)

Might prefer this to be in <see tag or something. #Resolved

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -83,10 +83,9 @@ public virtual IList<FeatureSubsetModel<IPredictorProducing<TOutput>>> Prune(ILi
IDataScorerTransform scorePipe = ScoreUtils.GetScorer(model.Predictor, testData, Host, testData.Schema);
// REVIEW: Should we somehow allow the user to customize the evaluator?
// By what mechanism should we allow that?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment need to be moved to few line below

@@ -13,6 +13,8 @@
using Microsoft.ML.Runtime.Ensemble.Selector;
using Microsoft.ML.Ensemble.EntryPoints;
using Microsoft.ML.Runtime.Internal.Internallearn;
using Microsoft.ML.Runtime.EntryPoints;
using Microsoft.ML.Runtime.Learners;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need some sorting

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@eerhardt eerhardt merged commit 94401d5 into dotnet:master Aug 17, 2018
@eerhardt eerhardt deleted the EnsembleSubComponent branch August 17, 2018 22:03
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants