-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Tree estimators #855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tree estimators #855
Conversation
I will add tests next. We don't seem to have many ranking tests enabled :( #Resolved |
@@ -405,6 +414,9 @@ protected override string GetTestGraphHeader() | |||
return headerBuilder.ToString(); | |||
} | |||
|
|||
protected override RankingPredictionTransformer<FastTreeRankingPredictor> MakeTransformer(FastTreeRankingPredictor model, ISchema trainSchema) | |||
=> new RankingPredictionTransformer<FastTreeRankingPredictor>(Host, model, trainSchema, FeatureColumn.Name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FeatureColumn.Name); [](start = 96, length = 20)
should add the GroupID to the base constructor #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the behavior for the creation of the weight column, based on whether it is explicit, or implicit.
@@ -133,6 +136,18 @@ protected virtual Float GetMaxLabel() | |||
return Float.PositiveInfinity; | |||
} | |||
|
|||
private static SchemaShape.Column MakeWeightColumn(Optional<string> weightColumn) | |||
{ | |||
if (weightColumn == null || !weightColumn.IsExplicit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|| !weightColumn.IsExplicit [](start = 37, length = 27)
this is not entirely correct either. It won't create the column when the user doesn't specify the weight colum, because it already had the name weight in the data.
we can't peak at the data at this time.
@[email protected] @Zruty0 can we move from the Optional to just string for the weight, name, group ID and enforce the user typing in the names? is there another way around it, now that we need to know the information before seeing the data? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// (e.g., the prediction does not happen over a file as it did during training). | ||
/// </summary> | ||
[Fact] | ||
public void New_SimpleTrainAndPredictWithFT() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New_SimpleTrainAndPredictWithFT [](start = 20, length = 31)
move this test somewhere else #Resolved
@@ -15,6 +15,7 @@ | |||
using Microsoft.ML.Runtime.Internal.Utilities; | |||
using Microsoft.ML.Runtime.Model; | |||
using Microsoft.ML.Runtime.Internal.Internallearn; | |||
using Microsoft.ML.Core.Data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using [](start = 0, length = 5)
sort #Resolved
{ | ||
new SchemaShape.Column(DefaultColumnNames.Score, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false), | ||
new SchemaShape.Column(DefaultColumnNames.Probability, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false), | ||
new SchemaShape.Column(DefaultColumnNames.PredictedLabel, SchemaShape.Column.VectorKind.Scalar, BoolType.Instance, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double-check this is correct
adding test
… on the MakeGroupId
Making use of dataset definitions adding Iris.data and the adult.tiny files to TestDatasets adding regression and ranking tests
/// FastTreeBinaryClassification TrainerEstimator test | ||
/// </summary> | ||
[Fact] | ||
public void FastTreeRankerEstimator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public void FastTreeRankerEstimator() [](start = 7, length = 38)
this is currently failing. #Resolved
@@ -301,6 +301,52 @@ private static VersionInfo GetVersionInfo() | |||
} | |||
} | |||
|
|||
public sealed class RankingPredictionTransformer<TModel> : PredictionTransformerBase<TModel> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RankingPredictionTransformer [](start = 24, length = 28)
Is the reason why we have two types that are identical in practically everything but name, so we can identify ranking estimators vs. regression estimators in a statically typed way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this transformer should also expose the group ID column name, at least that would be my belief
In reply to: 218214277 [](ancestors = 218214277)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually thought about this, like labels group ids are only needed for training, right? So for prediction I don't think they should be.
In reply to: 218216192 [](ancestors = 218216192,218214277)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So keep it, or make the Regression one Generic and use it for both?
In reply to: 218216839 [](ancestors = 218216839,218216192,218214277)
@@ -57,7 +57,7 @@ public Arguments() | |||
env => new Ova(env, new Ova.Arguments() | |||
{ | |||
PredictorType = ComponentFactoryUtils.CreateFromFunction( | |||
e => new AveragedPerceptronTrainer(e, new AveragedPerceptronTrainer.Arguments())) | |||
e => new FastTreeBinaryClassificationTrainer(e, DefaultColumnNames.Label, DefaultColumnNames.Features)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastTreeBinaryClassificationTrainer [](start = 37, length = 35)
I'd really rather we didn't. This seems to fit into the same bucket as the discussion on #682. That ensembling should have a dependency on FastTree merely because we have a default does not make sense to me. If someone wants to use stacking, that's great, but they need to specify the learners. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's do that separately, when we shape the ensembles to take in the arguments in the constructor.
In reply to: 218215323 [](ancestors = 218215323,218215145)
@@ -25,6 +25,8 @@ | |||
using Microsoft.ML.Runtime.Training; | |||
using Microsoft.ML.Runtime.TreePredictor; | |||
using Newtonsoft.Json.Linq; | |||
using Microsoft.ML.Core.Data; | |||
using Microsoft.ML.Runtime.EntryPoints; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably just missing something obvious, but why does this now depend on entry-points namespace?
Also sorting. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is omission of Pigsty extensions deliberate? |
Did i misunderstand that for trainers we should hold on to doing the Pigsty extensions until we get the ml task, so we could extend on that, rather than the label? @[email protected] @Zruty0, let me know if i should actually work on them in the same PR. In reply to: 422161730 [](ancestors = 422161730) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same (mis)understanding. In any case, let's do it after this oner In reply to: 422174998 [](ancestors = 422174998,422161730) |
Fixing the other two tests
Ongoing work on converting the trainers to estimators. This PR converts the Tree -type Predictors.