-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Consider defaulting Ensemble Stacking to a trainer in StandardLearners #682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Another reason to do this is that FastTree is already an ensemble :) I see limited value in an ensemble of FastTrees. |
Hi @eerhardt . It is certainly undesirable that this dependency exists. On the other hand, if we wanted to use something in standard learners, that probably means something like a linear learner, and ensemble learning of linear learners is practically a bit less helpful as a concept. My preference might be to simply not have a default at all: if someone wants to employ one of these meta-trainers, they have to tell us what underlying trainer they want to employ. It is a bit strange that this one decision, certainly by far the most consequential decision the user has to make when employing this method, has a default. But I know some people (not me) are allergic to to the idea that a trainer would require configuration. Hi @Zruty0 , actually ensembles of FastTrees models have historically been quite good models. There are two very distinct types of ensembling going on with this configuration: boosting (where the trees are directly dependent on each other), and ensembling based on different samplings of the data (where each learnt model is less directly dependent)... that is, basically the difference between this and this. On the other hand we have the "bagging" functionality in FastTree itself to enable this, though I feel like the implementation there has some problems (not least that it due to various restrictions doesn't actually do bagging, despite its name). |
Thanks for the explanation on ensembles of ensembles.
I think that should be true for the basic scenarios, like "just fit me a linear regression, what's so hard about it?", but I agree that this argument should not apply to meta-learners, as they are not to be considered basic scenarios. |
@TomFinley , should we close this then? |
@Zruty0 , as it happens I have to fix this issue anyway for completely incidental reasons. |
* Move IModelCombiner out of Core to Ensemble since it clearly belongs there, not in Core. * Remove dependency of Ensemble on FastTree. * Remove learners in Ensemble having defaults of FastTree or indeed any learner. (Incidentally: fixes dotnet#682.) * Rename *FastTree* Ensemble to TreeEnsemble, so as to avoid namespace/type collisions with that type and Ensemble namespace. * Add dependency of FastTree to Ensemble project so something there can implement TreeEnsembleCombiner. * Resolve circular dependency of FastTree -> Ensemble -> StandardLearners -> Legacy -> FastTree by removing Legacy as dependency of StandardLearners, since no project we intend to keep should depend on Legacy. * Move Legacy specific infrastructure that somehow was in StandardLearners over to Legacy. * Fix documentation in StandardLearners that was incorrectly referring to the Legacy pipelines and types directly, since in reality they have nothing to do with the types in Legacy.
#1563) * Move IModelCombiner out of Core to Ensemble since it clearly belongs there, not in Core. * Remove dependency of Ensemble on FastTree. * Remove learners in Ensemble having defaults of FastTree or indeed any learner. (Incidentally: fixes #682.) * Rename *FastTree* Ensemble to TreeEnsemble, so as to avoid namespace/type collisions with that type and Ensemble namespace. * Add dependency of FastTree to Ensemble project so something there can implement TreeEnsembleCombiner. * Resolve circular dependency of FastTree -> Ensemble -> StandardLearners -> Legacy -> FastTree by removing Legacy as dependency of StandardLearners, since no project we intend to keep should depend on Legacy. * Move Legacy specific infrastructure that somehow was in StandardLearners over to Legacy. * Fix documentation in StandardLearners that was incorrectly referring to the Legacy pipelines and types directly, since in reality they have nothing to do with the types in Legacy.
See the conversation here: #681 (comment)
Ensemble Stacking defaults to using FastTree when users don't specify an underlying trainer. This results in a non-ideal dependency from
Microsoft.ML.Ensemble
toMicrosoft.ML.FastTree
, and would cause problems if we ever considered separating FastTree into its own NuGet package.We should consider making a different default trainer under our Stacking use something in the StandardLearners assembly.
/cc @TomFinley
The text was updated successfully, but these errors were encountered: