-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Update tests (using Iris dataset) to use new API #2008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
abgoswam
commented
Jan 3, 2019
- Update a couple of tests (using Iris dataset) to use the new API.
public partial class ScenariosTests | ||
{ | ||
[Fact] | ||
public void TrainAndPredictIrisModelTest() | ||
public void New_TrainAndPredictIrisModelTest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New_ [](start = 20, length = 4)
I believe "New" prefix was used because we had two APIs but now the old API is gone and there is just one API, do you feel we still need to prefix test with "New"? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it during development when I had both tests in this file. Forgot yo drop the 'New_' prefix after deleting the old test. Will fix.
In reply to: 245111276 [](ancestors = 245111276)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
var pipe = mlContext.Transforms.Concatenate("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth") | ||
.Append(mlContext.Transforms.Normalize("Features")) | ||
.AppendCacheCheckpoint(mlContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we adding normalizer transform? just curious. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the new API the paradigm is to be explicit about whether are using Normalizers in the pipeline .
(In the old Legacy API normalizers used to be auto-inserted based on the algorithm)
Hence decided to be explicit about normalizers. Please see #1604 which talks about removing 'smarts' (i.e. no auto-normalization, no auto-caching and no auto-calibration) in the new API
In reply to: 245111952 [](ancestors = 245111952)
Assert.Equal(1, matrix[2, 1]); | ||
Assert.Equal(1, matrix["2", "1"]); | ||
Assert.Equal(49, matrix[2, 2]); | ||
Assert.Equal(49, matrix["2", "2"]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not have confusion matrix in metrics? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see it inside metrics
. Who would be the best person to confirm whether we support ConfusionMatrix
in the new API ?
If we do, is there an example I can follow ? I do not see it in the CookBook either
In reply to: 245114114 [](ancestors = 245114114)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -22,7 +22,6 @@ | |||
<ProjectReference Include="..\..\src\Microsoft.ML.StaticPipe\Microsoft.ML.StaticPipe.csproj" /> | |||
<ProjectReference Include="..\..\src\Microsoft.ML.TensorFlow.StaticPipe\Microsoft.ML.TensorFlow.StaticPipe.csproj" /> | |||
<ProjectReference Include="..\..\src\Microsoft.ML.TensorFlow\Microsoft.ML.TensorFlow.csproj" /> | |||
<ProjectReference Include="..\..\src\Microsoft.ML.Legacy\Microsoft.ML.Legacy.csproj" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 4, length = 87)
Nice!! #Resolved
@abgoswam Can we just delete this files instead of modifying them? We have test which do evaluation, we have tests which do predictions (with new API), what's the point of having this ones? #Resolved |
@Ivanidzo4ka We did that and it results in code coverage drop. Check out my PR that removes these files. It’s possible removing some tests might be ok but which ones will require tests by tests code coverage analysis. It might just be easier to convert all the test to new API. Even though these tests are doing training, evaluation, etc but they may be using different transforms, trainers, settings, etc. #Resolved |
|
||
Assert.Equal(.98, metrics.AccuracyMacro); | ||
Assert.Equal(.98, metrics.AccuracyMicro, 2); | ||
Assert.Equal(.06, metrics.LogLoss, 2); | ||
Assert.InRange(metrics.LogLoss, .05, .06); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use Assert.Equal(…, …, 2)? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also prefer Equal #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
Assert.Equal(.98, metrics.AccuracyMacro); | ||
Assert.Equal(.98, metrics.AccuracyMicro, 2); | ||
Assert.Equal(.06, metrics.LogLoss, 2); | ||
Assert.InRange(metrics.LogLoss, .05, .06); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert.Equal(metrics.Logloss, 0.055, 3) maybe? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public string IrisPlantType; | ||
} | ||
|
||
public class IrisPredictionWithStringLabel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those data structures used outside this file? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it private. To make the compiler happy I am using properties instead of fields
In reply to: 245133573 [](ancestors = 245133573)
I've run it with tests and without tests, and only dll for which code coverage went down is Legacy, which is expected. Everything else either state same, or went up by 0.01% :) In reply to: 451270185 [](ancestors = 451270185) |
I have explained to you in person the issue with running code coverage per test in the context of parallelizing test conversion effort. Say you have two tests, each assigned to two devs, these two tests are exercising same code paths, if both devs in parallel run code coverage for the test assigned to them then both will end up removing both tests because of this race condition and then we lose code coverage. I do not see this as a safe and quick way to prevent regression in code coverage due to the removal Legacy API binary. The fastest and safest way is to convert all the test to the new API. Correct me if I'm wrong. Also if you are comparing against master (with test) you will need to remove legacy dll from the code coverage analysis so we don't see code coverage going up just because we removed a test! In reply to: 451278660 [](ancestors = 451278660,451270185) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.