Skip to content

Get rid of value tuples in TrainTest and CrossValidation #2507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 12, 2019

Conversation

Ivanidzo4ka
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka commented Feb 11, 2019

Fixes #2501

}
}

public class CrossValidationResult<T> where T : class
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CrossValidationResult [](start = 21, length = 21)

Summary #Closed

/// <summary>
/// Metrics for cross validation fold.
/// </summary>
public readonly T Metrics;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T [](start = 28, length = 1)

Part of me want to create interface and mark with it all Metric classes. Another part of me remembers what we try get rid of all empty interfaces.
Maybe empty base class?
Would like to hear your comments as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for some way of making this not generic.


In reply to: 255714622 [](ancestors = 255714622)

/// </summary>
public readonly ITransformer Model;
/// <summary>
/// <see cref="IDataView"/> for scored fold.
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for scored fold [](start = 40, length = 15)

Scored test fold? Any one has good ideas? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The scored hold-out set for this fold"


In reply to: 255714995 [](ancestors = 255714995)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScoredHoldOutSet is literal, but long.


In reply to: 255726140 [](ancestors = 255726140,255714995)

/// </summary>
public readonly int Fold;

public CrossValidationResult(ITransformer model, T metrics, IDataView scores, int fold)
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CrossValidationResult [](start = 19, length = 21)

summary or internal, probably internal. #Closed

@Ivanidzo4ka
Copy link
Contributor Author

Ivanidzo4ka commented Feb 11, 2019

    public (BinaryClassificationMetrics metrics, ITransformer model, IDataView scoredTestData)[] CrossValidateNonCalibrated(

miss this one. #Closed


Refers to: src/Microsoft.ML.Data/TrainCatalog.cs:344 in 3d9b174. [](commit_id = 3d9b174, deletion_comment = False)

/// A pair of datasets, for the train and test set.
/// </summary>
public struct TrainTestData
{
Copy link
Contributor

@rogancarr rogancarr Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why struct? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why class?
it's a small collection of immutable objects.


In reply to: 255723990 [](ancestors = 255723990)

}
}
/// <summary>
/// Results of running crossvalidation.
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crossvalidation [](start = 31, length = 15)

two words #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hyphenated


In reply to: 255724522 [](ancestors = 255724522)

/// A pair of datasets, for the train and test set.
/// </summary>
public struct TrainTestData
{
Copy link
Contributor

@rogancarr rogancarr Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see us wanting to expand this in the future to also include a validation set, so it could be prudent to keep the name a bit vague. PartitionedData? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be summary for PartitionedData?
This one used as output for TrainTestSplit function, if we ever introduce TrainTestValidationSplit I would prefer to create another object for that.


In reply to: 255725077 [](ancestors = 255725077)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!


In reply to: 255725959 [](ancestors = 255725959,255725077)

}

/// <summary>
/// Results for specific cross validation fold.
Copy link
Contributor

@rogancarr rogancarr Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cross validation [](start = 33, length = 16)

Hyphenated #Resolved

/// Results of running crossvalidation.
/// </summary>
/// <typeparam name="T">Type of metric class</typeparam>
public class CrossValidationResult<T> where T : class
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CrossValidationResult [](start = 21, length = 21)

sealed. #Resolved

/// <summary>
/// Fold number.
/// </summary>
public readonly int Fold;
Copy link
Contributor

@rogancarr rogancarr Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public readonly int Fold [](start = 12, length = 24)

Is this necessary, since they will be returned in an (ordered) array? And it'll be confusing if the order of the array doesn't match the fold number. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary ?- no, is it nice to have? - I think so. As soon as you tear them away from array, and let's say you want to report specific folds and do some stuff on them, you need to pass around fold anyway.

it'll be confusing if the order of the array doesn't match the fold number.
Not the case right now, but I would prefer to know real fold than index in array.


In reply to: 255727531 [](ancestors = 255727531)

@rogancarr
Copy link
Contributor

    public static (DataView<T> trainSet, DataView<T> testSet) TrainTestSplit<T>(this TrainCatalogBase catalog,

This should return the TrainTestSplit object too.


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:34 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr
Copy link
Contributor

    /// <typeparam name="T">The tuple describing the data schema.</typeparam>

Another Tuple we will want to get rid of.


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:23 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr
Copy link
Contributor

    public static (RegressionMetrics metrics, Transformer<TInShape, TOutShape, TTransformer> model, DataView<TOutShape> scoredTestData)[] CrossValidate<TInShape, TOutShape, TTransformer>(

Tuple to Object


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:76 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr
Copy link
Contributor

    public static (MultiClassClassifierMetrics metrics, Transformer<TInShape, TOutShape, TTransformer> model, DataView<TOutShape> scoredTestData)[] CrossValidate<TInShape, TOutShape, TTransformer>(

Tuple => Object.


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:134 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr
Copy link
Contributor

    public static (BinaryClassificationMetrics metrics, Transformer<TInShape, TOutShape, TTransformer> model, DataView<TOutShape> scoredTestData)[] CrossValidateNonCalibrated<TInShape, TOutShape, TTransformer>(

Tuple to Object


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:192 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor Author

    /// <typeparam name="T">The tuple describing the data schema.</typeparam>

If I correctly understand @TomFinley, he is fine with tuples in StaticPipe project. Can be wrong. Would prefer to hear words of wisdom from him.


In reply to: 462528351 [](ancestors = 462528351)


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:23 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr
Copy link
Contributor

    public static (CalibratedBinaryClassificationMetrics metrics, Transformer<TInShape, TOutShape, TTransformer> model, DataView<TOutShape> scoredTestData)[] CrossValidate<TInShape, TOutShape, TTransformer>(

Tuple to Object


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:250 in 0790005. [](commit_id = 0790005, deletion_comment = False)

Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good! Just a few comments.

Question: Does this compile? There are quite a few samples that use this API that you haven't bumped yet.

@codecov
Copy link

codecov bot commented Feb 12, 2019

Codecov Report

Merging #2507 into master will increase coverage by <.01%.
The diff coverage is 76.81%.

@@            Coverage Diff             @@
##           master    #2507      +/-   ##
==========================================
+ Coverage   71.24%   71.24%   +<.01%     
==========================================
  Files         798      798              
  Lines      141231   141252      +21     
  Branches    16112    16112              
==========================================
+ Hits       100623   100641      +18     
- Misses      36142    36145       +3     
  Partials     4466     4466
Flag Coverage Δ
#Debug 71.24% <76.81%> (ø) ⬆️
#production 67.57% <66.66%> (ø) ⬆️
#test 85.35% <100%> (ø) ⬆️

@rogancarr
Copy link
Contributor

    /// <typeparam name="T">The tuple describing the data schema.</typeparam>

Got it. No worries, if that's the case.


In reply to: 462529091 [](ancestors = 462529091,462528351)


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:23 in 0790005. [](commit_id = 0790005, deletion_comment = False)

Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -263,13 +344,14 @@ internal BinaryClassificationTrainers(BinaryClassificationCatalog catalog)
/// If the <paramref name="stratificationColumn"/> is not provided, the random numbers generated to create it, will use this seed as value.
/// And if it is not provided, the default value will be used.</param>
/// <returns>Per-fold results: metrics, models, scored datasets.</returns>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-fold results [](start = 21, length = 16)

maybe<see cref="CrossValidationResult"

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka Ivanidzo4ka merged commit 56607ba into dotnet:master Feb 12, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants