Skip to content

Consider defaulting Ensemble Stacking to a trainer in StandardLearners #682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eerhardt opened this issue Aug 15, 2018 · 5 comments · Fixed by #1563
Closed

Consider defaulting Ensemble Stacking to a trainer in StandardLearners #682

eerhardt opened this issue Aug 15, 2018 · 5 comments · Fixed by #1563
Labels
need info This issue needs more info before triage question Further information is requested

Comments

@eerhardt
Copy link
Member

See the conversation here: #681 (comment)

Ensemble Stacking defaults to using FastTree when users don't specify an underlying trainer. This results in a non-ideal dependency from Microsoft.ML.Ensemble to Microsoft.ML.FastTree, and would cause problems if we ever considered separating FastTree into its own NuGet package.

We should consider making a different default trainer under our Stacking use something in the StandardLearners assembly.

/cc @TomFinley

@Zruty0
Copy link
Contributor

Zruty0 commented Aug 17, 2018

Another reason to do this is that FastTree is already an ensemble :) I see limited value in an ensemble of FastTrees.

@TomFinley
Copy link
Contributor

Hi @eerhardt . It is certainly undesirable that this dependency exists. On the other hand, if we wanted to use something in standard learners, that probably means something like a linear learner, and ensemble learning of linear learners is practically a bit less helpful as a concept.

My preference might be to simply not have a default at all: if someone wants to employ one of these meta-trainers, they have to tell us what underlying trainer they want to employ. It is a bit strange that this one decision, certainly by far the most consequential decision the user has to make when employing this method, has a default. But I know some people (not me) are allergic to to the idea that a trainer would require configuration.

Hi @Zruty0 , actually ensembles of FastTrees models have historically been quite good models. There are two very distinct types of ensembling going on with this configuration: boosting (where the trees are directly dependent on each other), and ensembling based on different samplings of the data (where each learnt model is less directly dependent)... that is, basically the difference between this and this. On the other hand we have the "bagging" functionality in FastTree itself to enable this, though I feel like the implementation there has some problems (not least that it due to various restrictions doesn't actually do bagging, despite its name).

@Zruty0
Copy link
Contributor

Zruty0 commented Aug 17, 2018

Thanks for the explanation on ensembles of ensembles.

But I know some people (not me) are allergic to to the idea that a trainer would require configuration.

I think that should be true for the basic scenarios, like "just fit me a linear regression, what's so hard about it?", but I agree that this argument should not apply to meta-learners, as they are not to be considered basic scenarios.

This was referenced Sep 14, 2018
@Ivanidzo4ka Ivanidzo4ka added the question Further information is requested label Oct 19, 2018
@Zruty0
Copy link
Contributor

Zruty0 commented Oct 29, 2018

@TomFinley , should we close this then?

@Zruty0 Zruty0 added the need info This issue needs more info before triage label Oct 29, 2018
@TomFinley
Copy link
Contributor

@Zruty0 , as it happens I have to fix this issue anyway for completely incidental reasons.

TomFinley added a commit to TomFinley/machinelearning that referenced this issue Nov 7, 2018
* Move IModelCombiner out of Core to Ensemble since it clearly belongs there,
  not in Core.

* Remove dependency of Ensemble on FastTree.

* Remove learners in Ensemble having defaults of FastTree or indeed any
  learner. (Incidentally: fixes dotnet#682.)

* Rename *FastTree* Ensemble to TreeEnsemble, so as to avoid namespace/type
  collisions with that type and Ensemble namespace.

* Add dependency of FastTree to Ensemble project so something there can
  implement TreeEnsembleCombiner.

* Resolve circular dependency of FastTree -> Ensemble -> StandardLearners ->
  Legacy -> FastTree by removing Legacy as dependency of StandardLearners,
  since no project we intend to keep should depend on Legacy.

* Move Legacy specific infrastructure that somehow was in StandardLearners
  over to Legacy.

* Fix documentation in StandardLearners that was incorrectly referring to the
  Legacy pipelines and types directly, since in reality they have nothing to
  do with the types in Legacy.
TomFinley added a commit that referenced this issue Nov 7, 2018

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
#1563)

* Move IModelCombiner out of Core to Ensemble since it clearly belongs there,
  not in Core.

* Remove dependency of Ensemble on FastTree.

* Remove learners in Ensemble having defaults of FastTree or indeed any
  learner. (Incidentally: fixes #682.)

* Rename *FastTree* Ensemble to TreeEnsemble, so as to avoid namespace/type
  collisions with that type and Ensemble namespace.

* Add dependency of FastTree to Ensemble project so something there can
  implement TreeEnsembleCombiner.

* Resolve circular dependency of FastTree -> Ensemble -> StandardLearners ->
  Legacy -> FastTree by removing Legacy as dependency of StandardLearners,
  since no project we intend to keep should depend on Legacy.

* Move Legacy specific infrastructure that somehow was in StandardLearners
  over to Legacy.

* Fix documentation in StandardLearners that was incorrectly referring to the
  Legacy pipelines and types directly, since in reality they have nothing to
  do with the types in Legacy.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
need info This issue needs more info before triage question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants