-
Notifications
You must be signed in to change notification settings - Fork 1.9k
XML documentation for Normalizer #3432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3432 +/- ##
=========================================
Coverage ? 68.27%
=========================================
Files ? 638
Lines ? 113948
Branches ? 14256
=========================================
Hits ? 77796
Misses ? 32048
Partials ? 4104
|
/// ### Estimator Characteristics | ||
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | No | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Doesn't it have to look at the data to find the mean, variance, min, max etc? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be yes, thanks for mentioning. I will correct this.
In reply to: 276894215 [](ancestors = 276894215)
/// * Min Max - Linear rescale such that minimum and maximum values are mapped between -1 and 1. | ||
/// * Mean Variance - Rescale to unit variance and, optionally, zero mean. | ||
/// * Log Mean Variance - Rescale to unit variance on the log scale. | ||
/// * Binning - Bucketize and then rescale to between -1 and 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Isn't it between 0 and 1? #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like -1 and 1. See the sample:
machinelearning/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeBinning.cs
Line 58 in ea5c095
// 0.0000, -0.3333, 0.0000, 1.0000 |
In reply to: 276894276 [](ancestors = 276894276)
/// | Output column data type | The same as the data type in the input column | | ||
/// | ||
/// The resulting NormalizingEstimator will normalize the data in one of the following ways based upon how it was created: | ||
/// * Min Max - Linear rescale such that minimum and maximum values are mapped between -1 and 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
between -1 and 1 [](start = 83, length = 16)
suggestion: let's use intervals: [-1,1] or even better with latex:
i think it's more clear that way #Resolved
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | No | | ||
/// | Input column data type | <xref:System.Single> or <xref:System.Double> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 79, length = 1)
I think numeric vectors are also valid, aren't they? If so please change here and also the params. you can write it like this:
xref:System.Single, xref:System.Double, or a known-sized vector of those types.
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its only supports single or double:
if (!col.ItemType.Equals(NumberDataViewType.Single) && !col.ItemType.Equals(NumberDataViewType.Double)) |
In reply to: 276994571 [](ancestors = 276994571)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think known-sized vectors of numerics are supported by normalizer. that line you're referencing is also true if the vector contains only numeric values. could you please confirm with Ivan?
I'm certain that I've normalized the feature vector right before trainer
In reply to: 277045796 [](ancestors = 277045796,276994571)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look on line 298 in this file.
It's either scalar or vector of known-size of float or double
In reply to: 277069180 [](ancestors = 277069180,277045796,276994571)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I read too quickly through Shahab's original comment and thought he was asking if we supported integers. Yes we do support vectors, sorry for the confusion. I am updating.
In reply to: 277070189 [](ancestors = 277070189,277069180,277045796,276994571)
@@ -31,11 +31,12 @@ public static class NormalizationCatalog | |||
} | |||
|
|||
/// <summary> | |||
/// It normalizes the data based on the observed minimum and maximum values of the data. | |||
/// Create a <see cref="NormalizingEstimator"/>, which normalizes based on the observed minimum and maximum values of the data. | |||
/// </summary> | |||
/// <param name="catalog">The transform catalog</param> | |||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. [](start = 133, length = 1)
Add: Output data type will be the same as input data type. #Resolved
/// | Output column data type | The same data type as the input column | | ||
/// | ||
/// The resulting NormalizingEstimator will normalize the data in one of the following ways based upon how it was created: | ||
/// * Min Max - Linear rescale such that the values are mapped to the $[-1,1]$ with the minimum and maximum values being mapped to -1 and 1 respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1,1] [](start = 76, length = 5)
If we not talking about fixZero option all normalizers try to map to [0,1] range.
FixZero is special beast....
Not sure it's worth to specify range...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ivan - so if its 0,1 unless fixZero is set, why not then just document the normal case?
In reply to: 277070876 [](ancestors = 277070876)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added further description to address this.
In reply to: 277084973 [](ancestors = 277084973,277070876)
/// * [NormalizeBinning](xref:Microsoft.ML.NormalizationCatalog.NormalizeBinning(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int64,System.Boolean,System.Int32)) | ||
/// * [NormalizeSupervisedBinning](xref:Microsoft.ML.NormalizationCatalog.NormalizeSupervisedBinning(Microsoft.ML.TransformsCatalog,System.String,System.String,System.String,System.Int64,System.Boolean,System.Int32,System.Int32)) | ||
/// ]]> | ||
/// </format> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move it outside of remarks section and just use seealso?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not as this hits assembly issues. I would have to move code which was deemed lower priority. So for now I did this to at least have some kind of reference.
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | Yes | | ||
/// | Input column data type | <xref:System.Single> or<xref:System.Double> or a known-sized vector of those types. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// | Input column data type | <xref:System.Single> or<xref:System.Double> or a known-sized vector of those types. | | |
/// | Input column data type | <xref:System.Single> or <xref:System.Double> or a known-sized vector of those types. | | |
``` #Resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// * Min Max - A linear rescale that is based upon the minimum and maximum values for each row. | ||
/// * Mean Variance - Rescale each row to unit variance and, optionally, zero mean. | ||
/// * Log Mean Variance - Rescale each row to unit variance based on a log scale. | ||
/// * Binning - Bucketizes the data in each row and performs a linear rescale based on the calculated bins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation of binning doesn't carry much information. Also, we need equations for all of them. #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked offline and agreed to address later. Having equations for all of our transformer/trainers is a much larger issue to address.
In reply to: 277110413 [](ancestors = 277110413)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked in #3204