-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Converting KMeans++trainer to estimator. #979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -16,5 +17,34 @@ public partial class TrainerEstimators : TestDataPipeBase | |||
public TrainerEstimators(ITestOutputHelper helper) : base(helper) | |||
{ | |||
} | |||
|
|||
/// <summary> | |||
/// FastTreeBinaryClassification TrainerEstimator test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastTreeBinaryClassification [](start = 12, length = 28)
fix comment #Resolved
@@ -203,6 +233,18 @@ private static int ComputeNumThreads(IHost host, int? argNumThreads) | |||
return Math.Max(1, maxThreads); | |||
} | |||
|
|||
private static SchemaShape.Column MakeWeightColumn(string weightColumn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MakeWeightColumn [](start = 42, length = 16)
re-use from TrainerUtils #Resolved
/// Base class for the <see cref="ISingleFeaturePredictionTransformer{TModel}"/> working on clustering tasks. | ||
/// </summary> | ||
/// <typeparam name="TModel">An implementation of the <see cref="IPredictorProducing{TResult}"/></typeparam> | ||
public sealed class ClusteringPredictionTransformer<TModel> : SingleFeaturePredictionTransformerBase<TModel> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClusteringPredictionTransformer [](start = 24, length = 31)
I still don't like not making those generic on the scorer as well... #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -17,6 +17,7 @@ | |||
using Microsoft.ML.Runtime.Training; | |||
using Microsoft.ML.Runtime.Internal.Internallearn; | |||
using Microsoft.ML.Runtime.EntryPoints; | |||
using Microsoft.ML.Core.Data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort order #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
// *** Binary format *** | ||
// <base info> | ||
// id of string: scorer threshold column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// id of string: scorer threshold column [](start = 11, length = 41)
doesn't look right #Resolved
…inelearning-1 into KmeansPCAEstimators
@@ -52,9 +52,9 @@ public ParameterMixingCalibratedPredictor TrainKMeansAndLR() | |||
trans = TrainAndScoreTransform.Create(env, new TrainAndScoreTransform.Arguments | |||
{ | |||
Trainer = ComponentFactoryUtils.CreateFromFunction(host => | |||
new KMeansPlusPlusTrainer(host, new KMeansPlusPlusTrainer.Arguments() | |||
new KMeansPlusPlusTrainer(host, "Features", advancedSettings: s=> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
advancedSettings [](start = 68, length = 16)
I think K
needs to be elevated out of 'advanced #Closed
}; | ||
|
||
public static TestDataset mnistTiny28 = new TestDataset() | ||
{ | ||
name = "mnistTiny28", | ||
trainFilename = @"..\MNIST\Train-Tiny-28x28.txt", | ||
testFilename = @"..\MNIST\Test-Tiny-28x28.txt" | ||
trainFilename = @"Train-Tiny-28x28.txt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Train [](start = 30, length = 5)
intentional? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. The 'old' path is from the TLC solution structure.
In reply to: 219997779 [](ancestors = 219997779)
advancedSettings: s => { s.InitAlgorithm = KMeansPlusPlusTrainer.InitAlgorithm.KMeansParallel; }); | ||
|
||
TestEstimatorCore(pipeline, data); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} [](start = 8, length = 1)
Call Done()
at the end #Closed
/// </summary> | ||
/// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param> | ||
/// <param name="featureColumn">The name of the feature column.</param> | ||
/// <param name="weightColumn">The name for the column containing the initial weight.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial [](start = 78, length = 7)
example weights, not initial weight #Closed
Host.CheckUserArg(args.K > 0, nameof(args.K), "Must be positive"); | ||
|
||
// is this even necessary, if there is only one column, for example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this even necessary, if there is only one column, for example [](start = 15, length = 64)
It checks for non-emptiness of string #Closed
This calls for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
{ | ||
Host.CheckValue(args, nameof(args)); | ||
if (args == null) | ||
args = new Arguments(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The net effect of this change is that the internal constructor is now tolerant to passing in null
arguments.. Even though the constructor is internal I'd prefer to keep it clean, and maintain the invariants we have everywhere else. Note the Host.CheckValue(args, nameof(args));
check that was deleted.
If you still had the check, and just passed in new Arguments
in the above constructor, that would have had no more lines of code. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elevating K to constructor param
…d, the KMeans extension on the clustering context and a tet for it.
/// <param name="predictedLabel">The name of the predicted label column in <paramref name="data"/>.</param> | ||
/// <param name="features">The name of the feature column in <paramref name="data"/>.</param> | ||
/// <returns>The evaluation results.</returns> | ||
public Result Evaluate(IDataView data, string label, string score, string predictedLabel, string features = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label [](start = 54, length = 5)
what is 'label' and why is it required? Or 'predictedLabel' for that matter? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I double-checked. label
is required for NMI, if it's not provided, NMI is not calculated. That makes sense. Let's make it optional.
predictedLabel
is never required, as I suspected. Let's remove it.
In reply to: 220990067 [](ancestors = 220990067)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turning the label column into an optional one.
Ongoing work to address #754