Skip to content

Consider defaulting Ensemble Stacking to a trainer in StandardLearners #682

Closed
@eerhardt

Description

@eerhardt

See the conversation here: #681 (comment)

Ensemble Stacking defaults to using FastTree when users don't specify an underlying trainer. This results in a non-ideal dependency from Microsoft.ML.Ensemble to Microsoft.ML.FastTree, and would cause problems if we ever considered separating FastTree into its own NuGet package.

We should consider making a different default trainer under our Stacking use something in the StandardLearners assembly.

/cc @TomFinley

Activity

Zruty0

Zruty0 commented on Aug 17, 2018

@Zruty0
Contributor

Another reason to do this is that FastTree is already an ensemble :) I see limited value in an ensemble of FastTrees.

TomFinley

TomFinley commented on Aug 17, 2018

@TomFinley
Contributor

Hi @eerhardt . It is certainly undesirable that this dependency exists. On the other hand, if we wanted to use something in standard learners, that probably means something like a linear learner, and ensemble learning of linear learners is practically a bit less helpful as a concept.

My preference might be to simply not have a default at all: if someone wants to employ one of these meta-trainers, they have to tell us what underlying trainer they want to employ. It is a bit strange that this one decision, certainly by far the most consequential decision the user has to make when employing this method, has a default. But I know some people (not me) are allergic to to the idea that a trainer would require configuration.

Hi @Zruty0 , actually ensembles of FastTrees models have historically been quite good models. There are two very distinct types of ensembling going on with this configuration: boosting (where the trees are directly dependent on each other), and ensembling based on different samplings of the data (where each learnt model is less directly dependent)... that is, basically the difference between this and this. On the other hand we have the "bagging" functionality in FastTree itself to enable this, though I feel like the implementation there has some problems (not least that it due to various restrictions doesn't actually do bagging, despite its name).

Zruty0

Zruty0 commented on Aug 17, 2018

@Zruty0
Contributor

Thanks for the explanation on ensembles of ensembles.

But I know some people (not me) are allergic to to the idea that a trainer would require configuration.

I think that should be true for the basic scenarios, like "just fit me a linear regression, what's so hard about it?", but I agree that this argument should not apply to meta-learners, as they are not to be considered basic scenarios.

Zruty0

Zruty0 commented on Oct 29, 2018

@Zruty0
Contributor

@TomFinley , should we close this then?

added
need infoThis issue needs more info before triage
on Oct 29, 2018
TomFinley

TomFinley commented on Nov 7, 2018

@TomFinley
Contributor

@Zruty0 , as it happens I have to fix this issue anyway for completely incidental reasons.

added a commit that references this issue on Nov 7, 2018
7b2461c
added a commit that references this issue on Nov 7, 2018
d3b70b5
ghost locked as resolved and limited conversation to collaborators on Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    need infoThis issue needs more info before triagequestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @Ivanidzo4ka@eerhardt@TomFinley@Zruty0

      Issue actions

        Consider defaulting Ensemble Stacking to a trainer in StandardLearners · Issue #682 · dotnet/machinelearning