Skip to content

Converting KMeans++trainer to estimator. #979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Sep 28, 2018

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Sep 21, 2018

Ongoing work to address #754

@sfilipi sfilipi added the API Issues pertaining the friendly API label Sep 21, 2018
@sfilipi sfilipi self-assigned this Sep 21, 2018
@Zruty0 Zruty0 mentioned this pull request Sep 21, 2018
@@ -16,5 +17,34 @@ public partial class TrainerEstimators : TestDataPipeBase
public TrainerEstimators(ITestOutputHelper helper) : base(helper)
{
}

/// <summary>
/// FastTreeBinaryClassification TrainerEstimator test
Copy link
Member Author

@sfilipi sfilipi Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastTreeBinaryClassification [](start = 12, length = 28)

fix comment #Resolved

@@ -203,6 +233,18 @@ private static int ComputeNumThreads(IHost host, int? argNumThreads)
return Math.Max(1, maxThreads);
}

private static SchemaShape.Column MakeWeightColumn(string weightColumn)
Copy link
Member Author

@sfilipi sfilipi Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MakeWeightColumn [](start = 42, length = 16)

re-use from TrainerUtils #Resolved

/// Base class for the <see cref="ISingleFeaturePredictionTransformer{TModel}"/> working on clustering tasks.
/// </summary>
/// <typeparam name="TModel">An implementation of the <see cref="IPredictorProducing{TResult}"/></typeparam>
public sealed class ClusteringPredictionTransformer<TModel> : SingleFeaturePredictionTransformerBase<TModel>
Copy link
Member Author

@sfilipi sfilipi Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClusteringPredictionTransformer [](start = 24, length = 31)

I still don't like not making those generic on the scorer as well... #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed on #996


In reply to: 219580106 [](ancestors = 219580106)

@@ -17,6 +17,7 @@
using Microsoft.ML.Runtime.Training;
using Microsoft.ML.Runtime.Internal.Internallearn;
using Microsoft.ML.Runtime.EntryPoints;
using Microsoft.ML.Core.Data;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort order #Resolved

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:


// *** Binary format ***
// <base info>
// id of string: scorer threshold column
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// id of string: scorer threshold column [](start = 11, length = 41)

doesn't look right #Resolved

@@ -52,9 +52,9 @@ public ParameterMixingCalibratedPredictor TrainKMeansAndLR()
trans = TrainAndScoreTransform.Create(env, new TrainAndScoreTransform.Arguments
{
Trainer = ComponentFactoryUtils.CreateFromFunction(host =>
new KMeansPlusPlusTrainer(host, new KMeansPlusPlusTrainer.Arguments()
new KMeansPlusPlusTrainer(host, "Features", advancedSettings: s=>
Copy link
Contributor

@Zruty0 Zruty0 Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

advancedSettings [](start = 68, length = 16)

I think K needs to be elevated out of 'advanced #Closed

};

public static TestDataset mnistTiny28 = new TestDataset()
{
name = "mnistTiny28",
trainFilename = @"..\MNIST\Train-Tiny-28x28.txt",
testFilename = @"..\MNIST\Test-Tiny-28x28.txt"
trainFilename = @"Train-Tiny-28x28.txt",
Copy link
Contributor

@Zruty0 Zruty0 Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Train [](start = 30, length = 5)

intentional? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. The 'old' path is from the TLC solution structure.


In reply to: 219997779 [](ancestors = 219997779)

advancedSettings: s => { s.InitAlgorithm = KMeansPlusPlusTrainer.InitAlgorithm.KMeansParallel; });

TestEstimatorCore(pipeline, data);
}
Copy link
Contributor

@Zruty0 Zruty0 Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} [](start = 8, length = 1)

Call Done() at the end #Closed

/// </summary>
/// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param>
/// <param name="featureColumn">The name of the feature column.</param>
/// <param name="weightColumn">The name for the column containing the initial weight.</param>
Copy link
Contributor

@Zruty0 Zruty0 Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial [](start = 78, length = 7)

example weights, not initial weight #Closed

Host.CheckUserArg(args.K > 0, nameof(args.K), "Must be positive");

// is this even necessary, if there is only one column, for example
Copy link
Contributor

@Zruty0 Zruty0 Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this even necessary, if there is only one column, for example [](start = 15, length = 64)

It checks for non-emptiness of string #Closed

@Zruty0
Copy link
Contributor

Zruty0 commented Sep 24, 2018

This calls for a ClusteringContext with a pigstension for KMeans #Closed

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

{
Host.CheckValue(args, nameof(args));
if (args == null)
args = new Arguments();
Copy link
Contributor

@TomFinley TomFinley Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The net effect of this change is that the internal constructor is now tolerant to passing in null arguments.. Even though the constructor is internal I'd prefer to keep it clean, and maintain the invariants we have everywhere else. Note the Host.CheckValue(args, nameof(args)); check that was deleted.

If you still had the check, and just passed in new Arguments in the above constructor, that would have had no more lines of code. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appreciate pointing this out.


In reply to: 220271405 [](ancestors = 220271405)

@TomFinley
Copy link
Contributor

TomFinley commented Sep 25, 2018

using Microsoft.ML.Runtime;

I like how when you changed from Float to float you made those two separate commits. That makes things easier to check out. #Resolved


Refers to: src/Microsoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs:10 in bd436bb. [](commit_id = bd436bb, deletion_comment = True)

@sfilipi
Copy link
Member Author

sfilipi commented Sep 25, 2018

using Microsoft.ML.Runtime;

thanks for pointing out. Will keep that in mind.


In reply to: 424419588 [](ancestors = 424419588)


Refers to: src/Microsoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs:10 in bd436bb. [](commit_id = bd436bb, deletion_comment = True)

/// <param name="predictedLabel">The name of the predicted label column in <paramref name="data"/>.</param>
/// <param name="features">The name of the feature column in <paramref name="data"/>.</param>
/// <returns>The evaluation results.</returns>
public Result Evaluate(IDataView data, string label, string score, string predictedLabel, string features = null)
Copy link
Contributor

@Zruty0 Zruty0 Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label [](start = 54, length = 5)

what is 'label' and why is it required? Or 'predictedLabel' for that matter? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I double-checked. label is required for NMI, if it's not provided, NMI is not calculated. That makes sense. Let's make it optional.
predictedLabel is never required, as I suspected. Let's remove it.


In reply to: 220990067 [](ancestors = 220990067)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi merged commit 8aa4f1f into dotnet:master Sep 28, 2018
@sfilipi sfilipi deleted the KmeansPCAEstimators branch September 28, 2018 16:29
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants