Skip to content

Tree estimators #855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Sep 19, 2018
Merged

Tree estimators #855

merged 19 commits into from
Sep 19, 2018

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Sep 7, 2018

Ongoing work on converting the trainers to estimators. This PR converts the Tree -type Predictors.

@sfilipi
Copy link
Member Author

sfilipi commented Sep 7, 2018

I will add tests next. We don't seem to have many ranking tests enabled :( #Resolved

@sfilipi sfilipi self-assigned this Sep 7, 2018
@sfilipi sfilipi added the API Issues pertaining the friendly API label Sep 7, 2018
@sfilipi sfilipi added this to the 0918 milestone Sep 7, 2018
@sfilipi sfilipi changed the title WIP: Fast tree estimators WIP: Tree estimators Sep 7, 2018
@Zruty0 Zruty0 mentioned this pull request Sep 7, 2018
@@ -405,6 +414,9 @@ protected override string GetTestGraphHeader()
return headerBuilder.ToString();
}

protected override RankingPredictionTransformer<FastTreeRankingPredictor> MakeTransformer(FastTreeRankingPredictor model, ISchema trainSchema)
=> new RankingPredictionTransformer<FastTreeRankingPredictor>(Host, model, trainSchema, FeatureColumn.Name);
Copy link
Member Author

@sfilipi sfilipi Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FeatureColumn.Name); [](start = 96, length = 20)

should add the GroupID to the base constructor #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GroupID? why?


In reply to: 216093781 [](ancestors = 216093781)

Changing the behavior for the creation of the weight column, based on whether it is explicit, or implicit.
@@ -133,6 +136,18 @@ protected virtual Float GetMaxLabel()
return Float.PositiveInfinity;
}

private static SchemaShape.Column MakeWeightColumn(Optional<string> weightColumn)
{
if (weightColumn == null || !weightColumn.IsExplicit)
Copy link
Member Author

@sfilipi sfilipi Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|| !weightColumn.IsExplicit [](start = 37, length = 27)

this is not entirely correct either. It won't create the column when the user doesn't specify the weight colum, because it already had the name weight in the data.
we can't peak at the data at this time.

@[email protected] @Zruty0 can we move from the Optional to just string for the weight, name, group ID and enforce the user typing in the names? is there another way around it, now that we need to know the information before seeing the data? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot do this really, can we?


In reply to: 216135820 [](ancestors = 216135820)

/// (e.g., the prediction does not happen over a file as it did during training).
/// </summary>
[Fact]
public void New_SimpleTrainAndPredictWithFT()
Copy link
Contributor

@Zruty0 Zruty0 Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New_SimpleTrainAndPredictWithFT [](start = 20, length = 31)

move this test somewhere else #Resolved

@@ -15,6 +15,7 @@
using Microsoft.ML.Runtime.Internal.Utilities;
using Microsoft.ML.Runtime.Model;
using Microsoft.ML.Runtime.Internal.Internallearn;
using Microsoft.ML.Core.Data;
Copy link
Contributor

@Zruty0 Zruty0 Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using [](start = 0, length = 5)

sort #Resolved

{
new SchemaShape.Column(DefaultColumnNames.Score, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false),
new SchemaShape.Column(DefaultColumnNames.Probability, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false),
new SchemaShape.Column(DefaultColumnNames.PredictedLabel, SchemaShape.Column.VectorKind.Scalar, BoolType.Instance, false)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double-check this is correct

Making use of dataset definitions
adding Iris.data and the adult.tiny files to TestDatasets
adding regression and ranking tests
/// FastTreeBinaryClassification TrainerEstimator test
/// </summary>
[Fact]
public void FastTreeRankerEstimator()
Copy link
Member Author

@sfilipi sfilipi Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public void FastTreeRankerEstimator() [](start = 7, length = 38)

this is currently failing. #Resolved

@@ -301,6 +301,52 @@ private static VersionInfo GetVersionInfo()
}
}

public sealed class RankingPredictionTransformer<TModel> : PredictionTransformerBase<TModel>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RankingPredictionTransformer [](start = 24, length = 28)

Is the reason why we have two types that are identical in practically everything but name, so we can identify ranking estimators vs. regression estimators in a statically typed way?

Copy link
Contributor

@Zruty0 Zruty0 Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this transformer should also expose the group ID column name, at least that would be my belief


In reply to: 218214277 [](ancestors = 218214277)

Copy link
Contributor

@TomFinley TomFinley Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually thought about this, like labels group ids are only needed for training, right? So for prediction I don't think they should be.


In reply to: 218216192 [](ancestors = 218216192,218214277)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So keep it, or make the Regression one Generic and use it for both?


In reply to: 218216839 [](ancestors = 218216839,218216192,218214277)

@@ -57,7 +57,7 @@ public Arguments()
env => new Ova(env, new Ova.Arguments()
{
PredictorType = ComponentFactoryUtils.CreateFromFunction(
e => new AveragedPerceptronTrainer(e, new AveragedPerceptronTrainer.Arguments()))
e => new FastTreeBinaryClassificationTrainer(e, DefaultColumnNames.Label, DefaultColumnNames.Features))
Copy link
Contributor

@TomFinley TomFinley Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastTreeBinaryClassificationTrainer [](start = 37, length = 35)

I'd really rather we didn't. This seems to fit into the same bucket as the discussion on #682. That ensembling should have a dependency on FastTree merely because we have a default does not make sense to me. If someone wants to use stacking, that's great, but they need to specify the learners. #Pending

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe we can hold off for right now.


In reply to: 218215145 [](ancestors = 218215145)

Copy link
Member Author

@sfilipi sfilipi Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's do that separately, when we shape the ensembles to take in the arguments in the constructor.


In reply to: 218215323 [](ancestors = 218215323,218215145)

@@ -25,6 +25,8 @@
using Microsoft.ML.Runtime.Training;
using Microsoft.ML.Runtime.TreePredictor;
using Newtonsoft.Json.Linq;
using Microsoft.ML.Core.Data;
using Microsoft.ML.Runtime.EntryPoints;
Copy link
Contributor

@TomFinley TomFinley Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably just missing something obvious, but why does this now depend on entry-points namespace?

Also sorting. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Oversight


In reply to: 218216150 [](ancestors = 218216150)

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley
Copy link
Contributor

Is omission of Pigsty extensions deliberate?

@sfilipi
Copy link
Member Author

sfilipi commented Sep 17, 2018

Did i misunderstand that for trainers we should hold on to doing the Pigsty extensions until we get the ml task, so we could extend on that, rather than the label? @[email protected] @Zruty0, let me know if i should actually work on them in the same PR.


In reply to: 422161730 [](ancestors = 422161730)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Zruty0
Copy link
Contributor

Zruty0 commented Sep 17, 2018

I have the same (mis)understanding. In any case, let's do it after this oner


In reply to: 422174998 [](ancestors = 422174998,422161730)

@sfilipi sfilipi changed the title WIP: Tree estimators Tree estimators Sep 18, 2018
@sfilipi sfilipi merged commit d13b415 into dotnet:master Sep 19, 2018
@sfilipi sfilipi mentioned this pull request Sep 21, 2018
@sfilipi sfilipi deleted the fastTreeEstimators branch October 22, 2018 16:57
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants