Skip to content

Add an example of random PCA using in-memory data structure #2780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 28, 2019

Conversation

wschin
Copy link
Member

@wschin wschin commented Feb 27, 2019

As title. It also shows some benefits of using in-memory data described in #2726.

@wschin wschin added the documentation Related to documentation of ML.NET label Feb 27, 2019
@wschin wschin self-assigned this Feb 27, 2019
@wschin wschin requested review from abgoswam and rogancarr February 27, 2019 23:22
@wschin wschin changed the title Add an example of random PCA Add an example of random PCA using in-memory data structure Feb 27, 2019
@wschin wschin force-pushed the rpca-example branch 3 times, most recently from 8409d09 to 48c01d3 Compare February 28, 2019 00:49
@codecov
Copy link

codecov bot commented Feb 28, 2019

Codecov Report

Merging #2780 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2780      +/-   ##
==========================================
+ Coverage   71.66%   71.67%   +<.01%     
==========================================
  Files         809      809              
  Lines      142378   142416      +38     
  Branches    16119    16120       +1     
==========================================
+ Hits       102031   102072      +41     
+ Misses      35915    35912       -3     
  Partials     4432     4432
Flag Coverage Δ
#Debug 71.67% <100%> (ø) ⬆️
#production 67.91% <ø> (ø) ⬆️
#test 85.86% <100%> (+0.01%) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.PCA/PCACatalog.cs 92.3% <ø> (+38.46%) ⬆️
test/Microsoft.ML.Tests/AnomalyDetectionTests.cs 100% <100%> (ø) ⬆️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
...StandardLearners/Standard/LinearModelParameters.cs 60.63% <0%> (-0.27%) ⬇️
src/Microsoft.ML.PCA/PcaTrainer.cs 79.94% <0%> (+0.29%) ⬆️
src/Microsoft.ML.CpuMath/EigenUtils.cs 87.69% <0%> (+0.39%) ⬆️

{
/// <summary>
/// Example with 3 feature values.
/// </summary>
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don;t use XML style comments in samples, as it becomes harder to read than simple comments. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Thanks.


In reply to: 261047221 [](ancestors = 261047221)

/// Class used to capture prediction of <see cref="DataPoint"/> in <see cref="Example"/>.
/// </summary>
// We disable this warning because complier doesn't realize those fields below are assigned somewhere.
#pragma warning disable 649
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we disable this somewhere else? csproj etc? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do

   <PropertyGroup>
     <TargetFramework>netcoreapp2.1</TargetFramework>
     <OutputType>Exe</OutputType>
+    <NoWarn>1701;1702;0649</NoWarn>
   </PropertyGroup>

In reply to: 261047307 [](ancestors = 261047307)

new DataPoint(){ Features= new float[3] {-100, 50, -100} }
};

// Convert native C# class to IDataView, a consumble format to ML.NET functions.
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

native C# class [](start = 23, length = 15)

the List #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supper good to make it strongly-typed. No ambiguity anymore.


In reply to: 261047430 [](ancestors = 261047430)

// Apply the trained model on the training data.
var transformed = model.Transform(data);

// Read ML.NET predictions into C# class.
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C# class [](start = 44, length = 8)

an IEnumerable #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


In reply to: 261047514 [](ancestors = 261047514)

// The i-th sample is predicted as an outlier.
Console.WriteLine("The {0}-th example with features [{1}] is an outlier with a score of being inlier {2}",
i, featuresInText, result.Score);
}
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put in comment below what would the output of this look. (The results of those WriteLine) #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Please see iteration 3.


In reply to: 261047636 [](ancestors = 261047636)

/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[RPCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/AnomalyDetection/RandomizedPcaSampleWithOptions.cs)]
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomizedPcaSampleWithOptions [](start = 119, length = 30)

i don't think this exists #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Thanks!


In reply to: 261047854 [](ancestors = 261047854)

@@ -48,6 +50,98 @@ public void NoAnomalyTest()
Assert.Throws<ArgumentOutOfRangeException>(() => ML.AnomalyDetection.Evaluate(transformedData));
}

[Fact]
public static void RandomizedPcaInMemory()
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomizedPcaInMemory [](start = 27, length = 21)

love it :) #WontFix

@wschin wschin requested a review from sfilipi February 28, 2019 16:56

namespace Microsoft.ML.Samples.Dynamic.Trainers.AnomalyDetection
{
class RandomizedPcaSampleWithOptions
Copy link
Member Author

@wschin wschin Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class [](start = 4, length = 5)

public static #Closed

/// <summary>
/// Class used to capture prediction of <see cref="DataPoint"/> in <see cref="ExecutePipelineWithGivenRandomizedPcaTrainer"/>.
/// </summary>
#pragma warning disable 649
Copy link
Member

@abgoswam abgoswam Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am curious ..whats this warning ? #Resolved

Copy link
Member Author

@wschin wschin Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields assigned in runtime are considered never assigned, which is wrong. #Resolved

new DataPoint(){ Features= new float[3] {-100, 50, -100} }
};

// Convert native C# class to IDataView, a consumble format to ML.NET functions.
Copy link
Member

@abgoswam abgoswam Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

native C# class [](start = 23, length = 15)

nit : maybe just saying List would suffice ? #Resolved

Copy link
Member

@abgoswam abgoswam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -3,6 +3,7 @@
<PropertyGroup>
<TargetFramework>netcoreapp2.1</TargetFramework>
<OutputType>Exe</OutputType>
<NoWarn>1701;1702;0649</NoWarn>
Copy link
Member

@abgoswam abgoswam Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1701;1702;0649 [](start = 4, length = 31)

one concern i have is that users might not be aware of this..so they will get thrown off when they see the warning about "uninitialized field" ...

an alternative is to make them properties in which case we do not need this NoWarn, and also we do not see the "uninitialized field" warning when we use properties.

@sfilipi what do u think ?

Copy link
Member Author

@wschin wschin Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just switched to properties. No need to disable warning anymore. Your suggestion reduces the redundant information user need to know!


In reply to: 261304076 [](ancestors = 261304076)

// The 2 - th example with features[1, 2, 3] is an inlier with a score of being inlier 0.8450122
// The 3 - th example with features[0, 1, 0] is an inlier with a score of being inlier 0.9428905
// The 4 - th example with features[0, 2, 1] is an inlier with a score of being inlier 0.9999999
// The 5 - th example with features[-100, 50, -100] is an outlier with a score of being inlier 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: typically this goes below the code that generates it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

// The 2 - th example with features[1, 2, 3] is an inlier with a score of being inlier 0.8450122
// The 3 - th example with features[0, 1, 0] is an inlier with a score of being inlier 0.9428905
// The 4 - th example with features[0, 2, 1] is an inlier with a score of being inlier 0.9999999
// The 5 - th example with features[-100, 50, -100] is an outlier with a score of being inlier 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit, put it below the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.

@wschin wschin merged commit 6e9023f into dotnet:master Feb 28, 2019
@wschin wschin deleted the rpca-example branch February 28, 2019 23:36
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants