-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Scrub projection transforms #2865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2865 +/- ##
==========================================
- Coverage 71.79% 71.79% -0.01%
==========================================
Files 812 812
Lines 142680 142673 -7
Branches 16124 16124
==========================================
- Hits 102436 102430 -6
Misses 35831 35831
+ Partials 4413 4412 -1
|
string features, | ||
string weights = null, | ||
string featureColumnName, | ||
string exampleWeightColumnName = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exampleWeightColumnName [](start = 19, length = 23)
does 'example' need to be in the name? just too long.. #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <param name="newDim">Expected size of new vector.</param> | ||
/// <param name="useSin">Create two features for every random Fourier frequency? (one for cos and one for sin) </param> | ||
/// <param name="dimension">The number of random Fourier features to create.</param> | ||
/// <param name="useCosAndSinBases">If <see langword="true"/>, use both of cos and sin basis functions to create two features for every random Fourier frequency. /// Otherwise, only cos bases would be used.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Otherwise, only cos bases would be used. [](start = 171, length = 52)
new line #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -18,18 +18,18 @@ public static class PcaCatalog | |||
/// <param name="exampleWeightColumnName">The name of the example weight column (optional).</param> | |||
/// <param name="rank">The number of principal components.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have consistency among transforms which lower feature vector dimension.
Right now we have dimension and rank and pcaNum in whitening.
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know @TomFinley has some opinions about how MLContext should be organized.
Since this introduces quite a few sections I think @TomFinley should take a look before we can check it in. #Resolved |
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param> | ||
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="dimension">The number of random Fourier features to create.</param> | ||
/// <param name="useCosAndSinBases">If <see langword="true"/>, use both of cos and sin basis functions to create two features for every random Fourier frequency. /// Otherwise, only cos bases would be used.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// [](start = 171, length = 3)
new line #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// ]]> | ||
/// </format> | ||
/// </example> | ||
public static RandomFourierExpansionEstimator RandomFourierExpand(this TransformsCatalog.KernelExpansionTransforms catalog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RandomFourierExpansionEstimator [](start = 22, length = 31)
Why Random? I like the Expansion part, but why Random? I don't think the process it that random right? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formal name: randomized Fourier kernel approximation.
In reply to: 263199601 [](ancestors = 263199601)
Well, I feel like this is introducing too much categorization @wschin and @artidoro. So if the categories start to have one or two items, then I wonder what the point of categorization is. So why not just not categorize if we get to that point? Does anything really bad happen if transforms isn't endlessly categorized? This is starting to resemble a Linnaeusian taxonomy. I see a few end states with categorization. We have in this PR at the time I am writing descriptive categories, but they are too fine grained. We had previously categories that were, while accurate descriptions, utterly useless in terms of communicating anything to users (projection). The alternative is that we don't categorize these at all, and so I wonder if that's the most desirable state if we can't come up with good descriptive categories that don't suffer from having only one or two items in them. So how bad is it if we just put these things in transforms? #Resolved |
Not bad! Let's do this. In reply to: 470339262 [](ancestors = 470339262) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// </example> | ||
public static NormalizingEstimator Normalize(this TransformsCatalog catalog, | ||
string outputColumnName, string inputColumnName = null, | ||
NormalizingEstimator.NormalizerMode mode = NormalizingEstimator.NormalizerMode.MinMax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NormalizerMode [](start = 33, length = 14)
How about NormalizationMode? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// </format> | ||
/// </example> | ||
public static NormalizingEstimator Normalize(this TransformsCatalog catalog, | ||
NormalizingEstimator.NormalizerMode mode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NormalizerMode [](start = 33, length = 14)
NormalizationMode? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/samples/Microsoft.ML.Samples/Dynamic/ProjectionTransforms.cs
Outdated
Show resolved
Hide resolved
docs/samples/Microsoft.ML.Samples/Dynamic/ProjectionTransforms.cs
Outdated
Show resolved
Hide resolved
can we move away from PCA to Refers to: src/Microsoft.ML.PCA/PcaTrainer.cs:42 in d4edc48. [](commit_id = d4edc48, deletion_comment = False) |
Will do AnalyzeRandomizedPrincipleComponents In reply to: 470688500 [](ancestors = 470688500) Refers to: src/Microsoft.ML.PCA/PCACatalog.cs:56 in d4edc48. [](commit_id = d4edc48, deletion_comment = False) |
Sure. I will replace In reply to: 470690330 [](ancestors = 470690330) Refers to: src/Microsoft.ML.PCA/PcaTrainer.cs:42 in d4edc48. [](commit_id = d4edc48, deletion_comment = False) |
Fix #2831.
Transforms touched:
RFF, LpNorm, GcNorm, PCA, Whiten, Normalize.