Skip to content

XML documentation for Normalizer #3432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 20, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions src/Microsoft.ML.Data/Transforms/Normalizer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,43 @@

namespace Microsoft.ML.Transforms
{
/// <summary>
/// <see cref="IEstimator{TTransformer}"/> for the <see cref="NormalizingTransformer"/>.
/// </summary>
/// <remarks>
/// <format type="text/markdown"><![CDATA[
///
/// ### Estimator Characteristics
/// | | |
/// | -- | -- |
/// | Does this estimator need to look at the data to train its parameters? | Yes |
/// | Input column data type | <xref:System.Single> or <xref:System.Double> or a known-sized vector of those types. |
/// | Output column data type | The same data type as the input column |
///
/// The resulting NormalizingEstimator will normalize the data in one of the following ways based upon how it was created:
/// * Min Max - A linear rescale that is based upon the minimum and maximum values for each row.
/// * Mean Variance - Rescale each row to unit variance and, optionally, zero mean.
/// * Log Mean Variance - Rescale each row to unit variance based on a log scale.
/// * Binning - Bucketizes the data in each row and performs a linear rescale based on the calculated bins.
Copy link
Member

@wschin wschin Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation of binning doesn't carry much information. Also, we need equations for all of them. #WontFix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked offline and agreed to address later. Having equations for all of our transformer/trainers is a much larger issue to address.


In reply to: 277110413 [](ancestors = 277110413)

/// * Supervised Binning - Bucketize the data in each row and performas a linear rescale based on the calculated bins. The bin calculation is based on correlation of the Label column.
///
/// ### Estimator Details
/// The interval of the normalized data depends on whether fixZero is specified or not. fixZero defaults to true.
/// When fixZero is false, the normalized interval is $[0,1]$ and the distribution of the normalized values depends on the normalization mode. For example, with Min Max, the minimum
/// and maximum values are mapped to 0 and 1 respectively and remaining values fall in between.
/// When fixZero is set, the normalized interval is $[-1,1]$ with the distribution of the normalized values depending on the normalization mode, but the behavior is different.
/// With Min Max, the distribution depends on how far away the number is from 0, resulting in the number with the largest distance being mapped to 1 if its a positive number
/// or -1 if its a negative number. The distance from 0 will affect the distribution with a majority of numbers that are closer together normalizing towards 0.
///
/// To create this estimator use one of the following:
/// * [NormalizeMinMax](xref:Microsoft.ML.NormalizationCatalog.NormalizeMinMax(Microsoft.ML.TransformsCatalog, System.String, System.String, System.Int64, System.Boolean))
/// * [NormalizeMeanVariance](xref:Microsoft.ML.NormalizationCatalog.NormalizeMeanVariance(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int64,System.Boolean,System.Boolean))
/// * [NormalizeLogMeanVariance](xref:Microsoft.ML.NormalizationCatalog.NormalizeLogMeanVariance(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int64,System.Boolean))
/// * [NormalizeBinning](xref:Microsoft.ML.NormalizationCatalog.NormalizeBinning(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int64,System.Boolean,System.Int32))
/// * [NormalizeSupervisedBinning](xref:Microsoft.ML.NormalizationCatalog.NormalizeSupervisedBinning(Microsoft.ML.TransformsCatalog,System.String,System.String,System.String,System.Int64,System.Boolean,System.Int32,System.Int32))
/// ]]>
/// </format>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move it outside of remarks section and just use seealso?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not as this hits assembly issues. I would have to move code which was deemed lower priority. So for now I did this to at least have some kind of reference.

/// </remarks>
public sealed class NormalizingEstimator : IEstimator<NormalizingTransformer>
{
[BestFriend]
Expand Down Expand Up @@ -284,6 +321,9 @@ public SchemaShape GetOutputSchema(SchemaShape inputSchema)
}
}

/// <summary>
/// <see cref="ITransformer"/> resulting from fitting an <see cref="NormalizingEstimator"/>.
/// </summary>
public sealed partial class NormalizingTransformer : OneToOneTransformerBase
{
internal const string LoaderSignature = "Normalizer";
Expand Down
Loading