-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add Functional Tests for Data I/O #2518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Functional Tests for Data I/O #2518
Conversation
@@ -20,5 +20,34 @@ public static void CheckMetrics(RegressionMetrics metrics) | |||
Assert.True(metrics.L2 >= 0); | |||
Assert.True(metrics.RSquared <= 1); | |||
} | |||
|
|||
public static void AssertEqual(float[] array1, float[] array2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssertEqual [](start = 27, length = 11)
Perhaps Enumerable.SequenceEqual(target1, target2); instead? from Linq
https://stackoverflow.com/questions/3232744/easiest-way-to-compare-arrays-in-c-sharp #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use the assert logic here so that we can capture equality to some level of precision. Default equality will fail after serialization due to floating point drift sometimes.
In reply to: 256208383 [](ancestors = 256208383)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus I want to Assert and not check for equality.
In reply to: 256209571 [](ancestors = 256209571,256208383)
Assert.Equal(array1[i], array2[i]); | ||
} | ||
|
||
public static void AssertEqual(Schema schema1, Schema schema2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema [](start = 39, length = 6)
Schould Schema implement IEquitable ?
https://stackoverflow.com/questions/8400028/comparing-two-instances-of-a-class #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be awesome, but probably not before 1.0.
In reply to: 256209052 [](ancestors = 256209052)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on implementing IEquitable. Should have an issue about it.
In reply to: 256209052 [](ancestors = 256209052)
public static void AssertEqual(Schema schema1, Schema schema2) | ||
{ | ||
Assert.NotNull(schema1); | ||
Assert.NotNull(schema2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically if they are both null they are equal... #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically yes. But internally in our library we consider nulls to not equal each other. :shruggy-emoticon:
In reply to: 256209640 [](ancestors = 256209640)
namespace Microsoft.ML.Functional.Tests | ||
{ | ||
/// <summary> | ||
/// Test data input and output formats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formats [](start = 35, length = 7)
but what about the dots? #Closed
} | ||
#endregion | ||
|
||
#region ToyDataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#region ToyDataset [](start = 7, length = 19)
not a big fan of regions here... #Closed
|
||
private IEnumerable<ToyDataset> GenerateToyDataset(int numExamples = 5, int seed = 1) | ||
{ | ||
var rng = new Random(seed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ng = new Random(seed); [](start = 17, length = 22)
we can't get random from context? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only from IHostEnvironment
, I think. That isn't accessible outside the library.
In reply to: 256211132 [](ancestors = 256211132)
} | ||
} | ||
|
||
private ReadOnlyMemory<char> GetRandomRomChar(Random rng, int length = 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetRandomRomChar [](start = 37, length = 16)
func name unclr - expand ROM #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #2518 +/- ##
=========================================
+ Coverage 71.25% 71.35% +0.1%
=========================================
Files 798 798
Lines 141252 141452 +200
Branches 16112 16125 +13
=========================================
+ Hits 100643 100935 +292
+ Misses 36145 36045 -100
- Partials 4464 4472 +8
|
Codecov Report
@@ Coverage Diff @@
## master #2518 +/- ##
==========================================
+ Coverage 71.35% 71.45% +0.09%
==========================================
Files 799 800 +1
Lines 141496 141700 +204
Branches 16120 16135 +15
==========================================
+ Hits 100971 101247 +276
+ Misses 36062 35986 -76
- Partials 4463 4467 +4
|
return new ReadOnlyMemory<char>(chars); | ||
} | ||
|
||
private sealed class ToyDataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might prove useful.Can we factor toy dataset out into separate file for other tests to share? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No less than 7 merges from master into a feature branch. 😄 |
I'm still not sure how that happened. I branched off my master and there they were! |
/// Write to and read from a delimited file: Schematized data of any DataKind can be read from a delimited file. | ||
/// </summary> | ||
/// <remarks> | ||
/// Tests the roundtrip hrough a file using schema inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrough [](start = 32, length = 6)
typo #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
namespace Microsoft.ML.Functional.Tests.Datasets | ||
{ | ||
internal sealed class AllTypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllTypes [](start = 26, length = 8)
This name is overly broad. Perhaps a TypeTestData? #Closed
|
||
namespace Microsoft.ML.Functional.Tests.Datasets | ||
{ | ||
internal sealed class AllTypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllTypes [](start = 26, length = 8)
At least one xml summary comment on what it is and what it is for. #Closed
Dt = DateTime.FromOADate(rng.Next(657435, 2958465)), | ||
Dz = DateTimeOffset.FromUnixTimeSeconds((long)(rng.NextDouble() * (1 + rng.Next()))), | ||
Ug = new RowId((ulong)rng.Next(), (ulong)rng.Next()) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factor-out into a method - GetRandomInstance(Random rnd, int featuresLength) #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factored out, made public with XML tags. Cannot add a featuresLength at this point (see above comment).
In reply to: 256542015 [](ancestors = 256542015)
return mlContext.Data.CreateTextLoader( | ||
new[] { | ||
new TextLoader.Column("Label", DataKind.Bool, 0), | ||
new TextLoader.Column("Features", DataKind.R4, 1, 5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 [](start = 74, length = 1)
factor out as param. Do same with the GetRandomINstance() #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't add this as a parameter unless there is a way to alter the class attributes dynamically. For our tests, we need LoadColumn attributes to point to the corresponding serialization column.
In reply to: 256542590 [](ancestors = 256542590)
/// <returns></returns> | ||
public static TypeTestData GetRandomInstance(Random rng) | ||
{ | ||
if (rng == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (rng == null) [](start = 12, length = 16)
We usually use Check()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds functional tests for Data I/O
Fixes #2508