Skip to content

XML documentation for Normalizer #3432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 20, 2019
Merged

Conversation

singlis
Copy link
Member

@singlis singlis commented Apr 19, 2019

Tracked in #3204

@singlis singlis added the documentation Related to documentation of ML.NET label Apr 19, 2019
@singlis singlis self-assigned this Apr 19, 2019
@codecov
Copy link

codecov bot commented Apr 19, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@082ab77). Click here to learn what that means.
The diff coverage is 83.33%.

@@            Coverage Diff            @@
##             master    #3432   +/-   ##
=========================================
  Coverage          ?   68.27%           
=========================================
  Files             ?      638           
  Lines             ?   113948           
  Branches          ?    14256           
=========================================
  Hits              ?    77796           
  Misses            ?    32048           
  Partials          ?     4104
Flag Coverage Δ
#Debug 68.27% <83.33%> (?)
#production 68.27% <83.33%> (?)
Impacted Files Coverage Δ
src/Microsoft.ML.Data/Transforms/Normalizer.cs 86.03% <ø> (ø)
src/Microsoft.ML.Transforms/NormalizerCatalog.cs 84.78% <83.33%> (ø)

/// ### Estimator Characteristics
/// | | |
/// | -- | -- |
/// | Does this estimator need to look at the data to train its parameters? | No |
Copy link
Contributor

@natke natke Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Doesn't it have to look at the data to find the mean, variance, min, max etc? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be yes, thanks for mentioning. I will correct this.


In reply to: 276894215 [](ancestors = 276894215)

/// * Min Max - Linear rescale such that minimum and maximum values are mapped between -1 and 1.
/// * Mean Variance - Rescale to unit variance and, optionally, zero mean.
/// * Log Mean Variance - Rescale to unit variance on the log scale.
/// * Binning - Bucketize and then rescale to between -1 and 1.
Copy link
Contributor

@natke natke Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Isn't it between 0 and 1? #ByDesign

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like -1 and 1. See the sample:


In reply to: 276894276 [](ancestors = 276894276)

/// | Output column data type | The same as the data type in the input column |
///
/// The resulting NormalizingEstimator will normalize the data in one of the following ways based upon how it was created:
/// * Min Max - Linear rescale such that minimum and maximum values are mapped between -1 and 1.
Copy link

@shmoradims shmoradims Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

between -1 and 1 [](start = 83, length = 16)

suggestion: let's use intervals: [-1,1] or even better with latex: $[-1,1]$

i think it's more clear that way #Resolved

/// | | |
/// | -- | -- |
/// | Does this estimator need to look at the data to train its parameters? | No |
/// | Input column data type | <xref:System.Single> or <xref:System.Double> |
Copy link

@shmoradims shmoradims Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 79, length = 1)

I think numeric vectors are also valid, aren't they? If so please change here and also the params. you can write it like this:

xref:System.Single, xref:System.Double, or a known-sized vector of those types.

#Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its only supports single or double:

if (!col.ItemType.Equals(NumberDataViewType.Single) && !col.ItemType.Equals(NumberDataViewType.Double))


In reply to: 276994571 [](ancestors = 276994571)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think known-sized vectors of numerics are supported by normalizer. that line you're referencing is also true if the vector contains only numeric values. could you please confirm with Ivan?

I'm certain that I've normalized the feature vector right before trainer


In reply to: 277045796 [](ancestors = 277045796,276994571)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look on line 298 in this file.
It's either scalar or vector of known-size of float or double


In reply to: 277069180 [](ancestors = 277069180,277045796,276994571)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I read too quickly through Shahab's original comment and thought he was asking if we supported integers. Yes we do support vectors, sorry for the confusion. I am updating.


In reply to: 277070189 [](ancestors = 277070189,277069180,277045796,276994571)

@@ -31,11 +31,12 @@ public static class NormalizationCatalog
}

/// <summary>
/// It normalizes the data based on the observed minimum and maximum values of the data.
/// Create a <see cref="NormalizingEstimator"/>, which normalizes based on the observed minimum and maximum values of the data.
/// </summary>
/// <param name="catalog">The transform catalog</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param>
Copy link

@shmoradims shmoradims Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. [](start = 133, length = 1)

Add: Output data type will be the same as input data type. #Resolved

/// | Output column data type | The same data type as the input column |
///
/// The resulting NormalizingEstimator will normalize the data in one of the following ways based upon how it was created:
/// * Min Max - Linear rescale such that the values are mapped to the $[-1,1]$ with the minimum and maximum values being mapped to -1 and 1 respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1,1] [](start = 76, length = 5)

If we not talking about fixZero option all normalizers try to map to [0,1] range.
FixZero is special beast....
Not sure it's worth to specify range...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ivan - so if its 0,1 unless fixZero is set, why not then just document the normal case?


In reply to: 277070876 [](ancestors = 277070876)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added further description to address this.


In reply to: 277084973 [](ancestors = 277084973,277070876)

/// * [NormalizeBinning](xref:Microsoft.ML.NormalizationCatalog.NormalizeBinning(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int64,System.Boolean,System.Int32))
/// * [NormalizeSupervisedBinning](xref:Microsoft.ML.NormalizationCatalog.NormalizeSupervisedBinning(Microsoft.ML.TransformsCatalog,System.String,System.String,System.String,System.Int64,System.Boolean,System.Int32,System.Int32))
/// ]]>
/// </format>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move it outside of remarks section and just use seealso?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not as this hits assembly issues. I would have to move code which was deemed lower priority. So for now I did this to at least have some kind of reference.

/// | | |
/// | -- | -- |
/// | Does this estimator need to look at the data to train its parameters? | Yes |
/// | Input column data type | <xref:System.Single> or<xref:System.Double> or a known-sized vector of those types. |
Copy link
Member

@wschin wschin Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// | Input column data type | <xref:System.Single> or<xref:System.Double> or a known-sized vector of those types. |
/// | Input column data type | <xref:System.Single> or <xref:System.Double> or a known-sized vector of those types. |
``` #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for catching the space.


In reply to: 277110149 [](ancestors = 277110149)

/// * Min Max - A linear rescale that is based upon the minimum and maximum values for each row.
/// * Mean Variance - Rescale each row to unit variance and, optionally, zero mean.
/// * Log Mean Variance - Rescale each row to unit variance based on a log scale.
/// * Binning - Bucketizes the data in each row and performs a linear rescale based on the calculated bins.
Copy link
Member

@wschin wschin Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation of binning doesn't carry much information. Also, we need equations for all of them. #WontFix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked offline and agreed to address later. Having equations for all of our transformer/trainers is a much larger issue to address.


In reply to: 277110413 [](ancestors = 277110413)

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@singlis singlis merged commit 270df4f into dotnet:master Apr 20, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants