-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Get rid of value tuples in TrainTest and CrossValidation #2507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of value tuples in TrainTest and CrossValidation #2507
Conversation
} | ||
} | ||
|
||
public class CrossValidationResult<T> where T : class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CrossValidationResult [](start = 21, length = 21)
Summary #Closed
/// <summary> | ||
/// Metrics for cross validation fold. | ||
/// </summary> | ||
public readonly T Metrics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T [](start = 28, length = 1)
Part of me want to create interface and mark with it all Metric classes. Another part of me remembers what we try get rid of all empty interfaces.
Maybe empty base class?
Would like to hear your comments as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// </summary> | ||
public readonly ITransformer Model; | ||
/// <summary> | ||
/// <see cref="IDataView"/> for scored fold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for scored fold [](start = 40, length = 15)
Scored test fold? Any one has good ideas? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// </summary> | ||
public readonly int Fold; | ||
|
||
public CrossValidationResult(ITransformer model, T metrics, IDataView scores, int fold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CrossValidationResult [](start = 19, length = 21)
summary or internal, probably internal. #Closed
/// A pair of datasets, for the train and test set. | ||
/// </summary> | ||
public struct TrainTestData | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why struct? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why class?
it's a small collection of immutable objects.
In reply to: 255723990 [](ancestors = 255723990)
} | ||
} | ||
/// <summary> | ||
/// Results of running crossvalidation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crossvalidation [](start = 31, length = 15)
two words #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// A pair of datasets, for the train and test set. | ||
/// </summary> | ||
public struct TrainTestData | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see us wanting to expand this in the future to also include a validation set, so it could be prudent to keep the name a bit vague. PartitionedData
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be summary for PartitionedData?
This one used as output for TrainTestSplit function, if we ever introduce TrainTestValidationSplit I would prefer to create another object for that.
In reply to: 255725077 [](ancestors = 255725077)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
/// <summary> | ||
/// Results for specific cross validation fold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cross validation [](start = 33, length = 16)
Hyphenated #Resolved
/// Results of running crossvalidation. | ||
/// </summary> | ||
/// <typeparam name="T">Type of metric class</typeparam> | ||
public class CrossValidationResult<T> where T : class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CrossValidationResult [](start = 21, length = 21)
sealed. #Resolved
/// <summary> | ||
/// Fold number. | ||
/// </summary> | ||
public readonly int Fold; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public readonly int Fold [](start = 12, length = 24)
Is this necessary, since they will be returned in an (ordered) array? And it'll be confusing if the order of the array doesn't match the fold number. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary ?- no, is it nice to have? - I think so. As soon as you tear them away from array, and let's say you want to report specific folds and do some stuff on them, you need to pass around fold anyway.
it'll be confusing if the order of the array doesn't match the fold number.
Not the case right now, but I would prefer to know real fold than index in array.
In reply to: 255727531 [](ancestors = 255727531)
Tuple to Object Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:76 in 0790005. [](commit_id = 0790005, deletion_comment = False) |
Tuple => Object. Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:134 in 0790005. [](commit_id = 0790005, deletion_comment = False) |
Tuple to Object Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:192 in 0790005. [](commit_id = 0790005, deletion_comment = False) |
If I correctly understand @TomFinley, he is fine with tuples in StaticPipe project. Can be wrong. Would prefer to hear words of wisdom from him. In reply to: 462528351 [](ancestors = 462528351) Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:23 in 0790005. [](commit_id = 0790005, deletion_comment = False) |
Tuple to Object Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:250 in 0790005. [](commit_id = 0790005, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good! Just a few comments.
Question: Does this compile? There are quite a few samples that use this API that you haven't bumped yet.
Codecov Report
@@ Coverage Diff @@
## master #2507 +/- ##
==========================================
+ Coverage 71.24% 71.24% +<.01%
==========================================
Files 798 798
Lines 141231 141252 +21
Branches 16112 16112
==========================================
+ Hits 100623 100641 +18
- Misses 36142 36145 +3
Partials 4466 4466
|
Got it. No worries, if that's the case. In reply to: 462529091 [](ancestors = 462529091,462528351) Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:23 in 0790005. [](commit_id = 0790005, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -263,13 +344,14 @@ internal BinaryClassificationTrainers(BinaryClassificationCatalog catalog) | |||
/// If the <paramref name="stratificationColumn"/> is not provided, the random numbers generated to create it, will use this seed as value. | |||
/// And if it is not provided, the default value will be used.</param> | |||
/// <returns>Per-fold results: metrics, models, scored datasets.</returns> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per-fold results [](start = 21, length = 16)
maybe<see cref="CrossValidationResult"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #2501