-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Replace SubComponent with IComponentFactory in ML.Ensemble #681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -10,6 +10,7 @@ | |||
<ProjectReference Include="..\Microsoft.ML.Data\Microsoft.ML.Data.csproj" /> | |||
<ProjectReference Include="..\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" /> | |||
<ProjectReference Include="..\Microsoft.ML.Transforms\Microsoft.ML.Transforms.csproj" /> | |||
<ProjectReference Include="..\Microsoft.ML.FastTree\Microsoft.ML.FastTree.csproj" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 4, length = 84)
We need this due to the usage of FastTree as the default traniner for some ensemble classes. This could be fine, but I wonder if using something from StandardLearners
might make a bit more sense, if we envision that someday FastTree will be factored into a separate nuget package. (Maybe we don't want that though, in which case this dependency is probably fine.) #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One advantage to this change is that it brings to light these dependencies. Before we had a weakly-typed dependency using strings and DI. So if we ever did refactor FastTree to be separate, we could have easily broken this dependency.
I've logged #682 for this issue. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eerhardt ... yes I expect we'll see some more of this "hidden dependency" becoming exposed as we move away from direct usage of subcomponents and dependency injection. I recall a similar issue in #446 with the normalizers being defined in Microsoft.ML.Transforms
and used in Microsoft.ML.Data
, despite the former project depending on the latter. 😄 #Resolved
@@ -19,7 +21,7 @@ namespace Microsoft.ML.Runtime.Ensemble.OutputCombiners | |||
{ | |||
using TScalarPredictor = IPredictorProducing<Single>; | |||
|
|||
public sealed class RegressionStacking : BaseScalarStacking<SignatureRegressorTrainer>, IRegressionOutputCombiner, ICanSaveModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SignatureRegressorTrainer [](start = 64, length = 25)
What are we going to do about this sort of restriction? I had suggested previously using something like ITrainer<IRegressionPredictor>
or something like this, but I'm not sure what you thought of that idea. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which restriction are you referring to? The Signature Type restriction, where we can't use typeof(TSig)
in an attribute?
My hope is that some day we remove signature types all together. I haven't thought of/discovered a good replacement for them yet. Maybe an approach similar to ComponentKind which is just a string "Kind".
However, that still wouldn't help in this case, because we need a different signature/kind/etc for each concrete class, but we want to define the argument on the base class. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant just restricting the DI and API usage to only consider actual, in this case, regression trainers, which is one of the functions of the signatures. Currently this is done in a fairly half-hearted fashion using this IPredictorProducing<float>
and IPredictorProducing<VBuffer<float>>
, but if we got rid of this and replaced it with, say, IRegressionPredictor
and IBinaryClassificationPredictor
and IMulticlassPredictor
(or something like that), then we could just have this be c omponent factory producing ITrainer<IRegressionPredictor>
in this case. I think MEF should work well with that restriction as well, from a DI perspective.
In reply to: 210463209 [](ancestors = 210463209)
{ | ||
get { return BasePredictorType; } | ||
set { BasePredictorType = value; } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. It seems like this boilerplate is necessarily only because ArgumentAttribute
cannot be applied to properties, so we have to declare this backing field, which is then exposed through the property. If that is so, maybe we ought to just allow it to be applied to properties and change the command line parsing code to handle properties as well as fields, so we can avoid this awkwardness. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline, I changed this to internal
, since Stacking doesn't allow external inheritors anyway. I also changed it to a method, since the base class only needs to retrieve the component factories. #Resolved
/// A class for creating a component with no extra parameters (other than an <see cref="IHostEnvironment"/>) | ||
/// that simply wraps a delegate which creates the component. | ||
/// </summary> | ||
public class SimpleComponentFactory<TComponent> : IComponentFactory<TComponent> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public class SimpleComponentFactory [](start = 4, length = 35)
This can be sealed I hope? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// </summary> | ||
public class SimpleComponentFactory<TComponent> : IComponentFactory<TComponent> | ||
{ | ||
private Func<IHostEnvironment, TComponent> _factory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private Func<IHostEnvironment, TComponent> _factory; [](start = 8, length = 52)
readonly? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// A class for creating a component with no extra parameters (other than an <see cref="IHostEnvironment"/>) | ||
/// that simply wraps a delegate which creates the component. | ||
/// </summary> | ||
public class SimpleComponentFactory<TComponent> : IComponentFactory<TComponent> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public class SimpleComponentFactory [](start = 4, length = 35)
I'm somewhat reluctant to expose more public types in addition to IComponentFactory
-- maybe if we just had something simple like a static creation class that returned IComponentFactory
, with things like:
public static class ComponentFactoryUtils {
public static IComponentFactory<TComponent> CreateFromFunction(Func<IHostEnvironment, TComponent> factory) { ... }
public static IComponentFactory<TArg1, TComponent> CreateFromFunction(Func<IHostEnvironment, TArg1, TComponent> factory) { ... }
public static IComponentFactory<TArg1, TArg2, TComponent> CreateFromFunction(Func<IHostEnvironment, TArg1, TArg2, TComponent> factory) { ... }
}
Then we could hide the actual implementations of these.
#Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. It also simplifies the call sites, in some cases, since the types can be inferred. Fixed.
In reply to: 210434551 [](ancestors = 210434551)
This reduces the public surface area.
|
||
public SimpleComponentFactory(Func<IHostEnvironment, TArg1, TComponent> factory) | ||
/// <summary> | ||
/// A utility class for creating IComponentFactory instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IComponentFactory [](start = 37, length = 17)
Might prefer this to be in <see
tag or something. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -83,10 +83,9 @@ public virtual IList<FeatureSubsetModel<IPredictorProducing<TOutput>>> Prune(ILi | |||
IDataScorerTransform scorePipe = ScoreUtils.GetScorer(model.Predictor, testData, Host, testData.Schema); | |||
// REVIEW: Should we somehow allow the user to customize the evaluator? | |||
// By what mechanism should we allow that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment need to be moved to few line below
@@ -13,6 +13,8 @@ | |||
using Microsoft.ML.Runtime.Ensemble.Selector; | |||
using Microsoft.ML.Ensemble.EntryPoints; | |||
using Microsoft.ML.Runtime.Internal.Internallearn; | |||
using Microsoft.ML.Runtime.EntryPoints; | |||
using Microsoft.ML.Runtime.Learners; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need some sorting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next round of SubComponent removal. Now the ML.Ensemble project is SubComponent free.
Working towards #585
Note: I had to move the Argument class's fields down to the concrete class because of the way we handle
SignatureType
. You can use a generic type in an attribute. However, this allowed me to remove theTSig
type on the base classes, which is nice since the signature Type shouldn't be in the public API.