Skip to content

Add Functional Tests for Data I/O #2518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Feb 14, 2019

Conversation

rogancarr
Copy link
Contributor

This PR adds functional tests for Data I/O

  • Reading from IEnumerable
  • Writing to IEnumerable
  • Writing to and reading from delimited tex files
    • With inferred schema
    • With explicit schema
  • Writing to and reading from Binary files

Fixes #2508

@@ -20,5 +20,34 @@ public static void CheckMetrics(RegressionMetrics metrics)
Assert.True(metrics.L2 >= 0);
Assert.True(metrics.RSquared <= 1);
}

public static void AssertEqual(float[] array1, float[] array2)
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssertEqual [](start = 27, length = 11)

Perhaps Enumerable.SequenceEqual(target1, target2); instead? from Linq
https://stackoverflow.com/questions/3232744/easiest-way-to-compare-arrays-in-c-sharp #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use the assert logic here so that we can capture equality to some level of precision. Default equality will fail after serialization due to floating point drift sometimes.


In reply to: 256208383 [](ancestors = 256208383)

Copy link
Contributor Author

@rogancarr rogancarr Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus I want to Assert and not check for equality.


In reply to: 256209571 [](ancestors = 256209571,256208383)

Assert.Equal(array1[i], array2[i]);
}

public static void AssertEqual(Schema schema1, Schema schema2)
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema [](start = 39, length = 6)

Schould Schema implement IEquitable ?
https://stackoverflow.com/questions/8400028/comparing-two-instances-of-a-class #Closed

Copy link
Contributor Author

@rogancarr rogancarr Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be awesome, but probably not before 1.0.


In reply to: 256209052 [](ancestors = 256209052)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on implementing IEquitable. Should have an issue about it.


In reply to: 256209052 [](ancestors = 256209052)

public static void AssertEqual(Schema schema1, Schema schema2)
{
Assert.NotNull(schema1);
Assert.NotNull(schema2);
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically if they are both null they are equal... #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes. But internally in our library we consider nulls to not equal each other. :shruggy-emoticon:


In reply to: 256209640 [](ancestors = 256209640)

namespace Microsoft.ML.Functional.Tests
{
/// <summary>
/// Test data input and output formats
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formats [](start = 35, length = 7)

but what about the dots? #Closed

}
#endregion

#region ToyDataset
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#region ToyDataset [](start = 7, length = 19)

not a big fan of regions here... #Closed


private IEnumerable<ToyDataset> GenerateToyDataset(int numExamples = 5, int seed = 1)
{
var rng = new Random(seed);
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ng = new Random(seed); [](start = 17, length = 22)

we can't get random from context? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only from IHostEnvironment, I think. That isn't accessible outside the library.


In reply to: 256211132 [](ancestors = 256211132)

}
}

private ReadOnlyMemory<char> GetRandomRomChar(Random rng, int length = 10)
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetRandomRomChar [](start = 37, length = 16)

func name unclr - expand ROM #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the Span naming convention from .NET.


In reply to: 256211211 [](ancestors = 256211211)

@codecov
Copy link

codecov bot commented Feb 13, 2019

Codecov Report

Merging #2518 into master will increase coverage by 0.1%.
The diff coverage is 99.54%.

@@            Coverage Diff            @@
##           master    #2518     +/-   ##
=========================================
+ Coverage   71.25%   71.35%   +0.1%     
=========================================
  Files         798      798             
  Lines      141252   141452    +200     
  Branches    16112    16125     +13     
=========================================
+ Hits       100643   100935    +292     
+ Misses      36145    36045    -100     
- Partials     4464     4472      +8
Flag Coverage Δ
#Debug 71.35% <99.54%> (+0.1%) ⬆️
#production 67.66% <ø> (+0.08%) ⬆️
#test 85.45% <99.54%> (+0.09%) ⬆️

@codecov
Copy link

codecov bot commented Feb 13, 2019

Codecov Report

Merging #2518 into master will increase coverage by 0.09%.
The diff coverage is 98.64%.

@@            Coverage Diff             @@
##           master    #2518      +/-   ##
==========================================
+ Coverage   71.35%   71.45%   +0.09%     
==========================================
  Files         799      800       +1     
  Lines      141496   141700     +204     
  Branches    16120    16135      +15     
==========================================
+ Hits       100971   101247     +276     
+ Misses      36062    35986      -76     
- Partials     4463     4467       +4
Flag Coverage Δ
#Debug 71.45% <98.64%> (+0.09%) ⬆️
#production 67.75% <ø> (+0.06%) ⬆️
#test 85.49% <98.64%> (+0.09%) ⬆️

return new ReadOnlyMemory<char>(chars);
}

private sealed class ToyDataset
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might prove useful.Can we factor toy dataset out into separate file for other tests to share? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored and called AllTypes.


In reply to: 256211603 [](ancestors = 256211603)

@TomFinley
Copy link
Contributor

No less than 7 merges from master into a feature branch. 😄

@rogancarr
Copy link
Contributor Author

No less than 7 merges from master into a feature branch. 😄

I'm still not sure how that happened. I branched off my master and there they were!

/// Write to and read from a delimited file: Schematized data of any DataKind can be read from a delimited file.
/// </summary>
/// <remarks>
/// Tests the roundtrip hrough a file using schema inference.
Copy link
Member

@sfilipi sfilipi Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrough [](start = 32, length = 6)

typo #Resolved

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:


namespace Microsoft.ML.Functional.Tests.Datasets
{
internal sealed class AllTypes
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllTypes [](start = 26, length = 8)

This name is overly broad. Perhaps a TypeTestData? #Closed


namespace Microsoft.ML.Functional.Tests.Datasets
{
internal sealed class AllTypes
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllTypes [](start = 26, length = 8)

At least one xml summary comment on what it is and what it is for. #Closed

Dt = DateTime.FromOADate(rng.Next(657435, 2958465)),
Dz = DateTimeOffset.FromUnixTimeSeconds((long)(rng.NextDouble() * (1 + rng.Next()))),
Ug = new RowId((ulong)rng.Next(), (ulong)rng.Next())
};
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factor-out into a method - GetRandomInstance(Random rnd, int featuresLength) #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factored out, made public with XML tags. Cannot add a featuresLength at this point (see above comment).


In reply to: 256542015 [](ancestors = 256542015)

return mlContext.Data.CreateTextLoader(
new[] {
new TextLoader.Column("Label", DataKind.Bool, 0),
new TextLoader.Column("Features", DataKind.R4, 1, 5),
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 [](start = 74, length = 1)

factor out as param. Do same with the GetRandomINstance() #ByDesign

Copy link
Contributor Author

@rogancarr rogancarr Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't add this as a parameter unless there is a way to alter the class attributes dynamically. For our tests, we need LoadColumn attributes to point to the corresponding serialization column.


In reply to: 256542590 [](ancestors = 256542590)

/// <returns></returns>
public static TypeTestData GetRandomInstance(Random rng)
{
if (rng == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (rng == null) [](start = 12, length = 16)

We usually use Check()?

Copy link
Contributor

@glebuk glebuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rogancarr rogancarr merged commit 82fd7ea into dotnet:master Feb 14, 2019
@rogancarr rogancarr deleted the 2499_datasources_scenario_tests branch February 14, 2019 16:49
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants