-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Normalize documentation #3244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize documentation #3244
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3244 +/- ##
==========================================
- Coverage 72.65% 72.64% -0.01%
==========================================
Files 807 807
Lines 145191 145191
Branches 16223 16223
==========================================
- Hits 105485 105480 -5
- Misses 35290 35293 +3
- Partials 4416 4418 +2
|
foreach (var row in columnFixZero) | ||
Console.WriteLine(string.Join(", ", row.Select(x => x.ToString("f4")))); | ||
// 1.0000, 0.0000, 1.0000, 0.0000 | ||
// 0.5000, 0.5000, 0.0000, 0.5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you add (here and in other files)
// Expected output: #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect expected values are incorrect: input: 2, 2, 2, 0 minmaxnormalized returns 1,1,1,0 or am I missing something?
var transformParams = (normalizeTransform.GetNormalizerModelParameters(0) as Microsoft.ML.Transforms.NormalizingTransformer.BinNormalizerModelParameters<ImmutableArray<float>>); | ||
Console.WriteLine($"Values for slot 0 would be transfromed by applying y = (Index(x) / {transformParams.Density[0]}) - {(transformParams.Offset.Length == 0 ? 0 : transformParams.Offset[0])}"); | ||
Console.WriteLine("Where Index(x) is index of bin to which x belongs"); | ||
Console.WriteLine($"Bins upper borders are: {string.Join(" ", transformParams.UpperBounds[0])}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
borders [](start = 43, length = 7)
Maybe would suggest to use "bounds" instead of "borders". #Resolved
// 0.0000, -0.5000, 0.0000, 1.0000 | ||
// 0.0000, -0.5000, 0.0000, 1.0000 | ||
|
||
// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for function. [](start = 118, length = 13)
.... as parameter to GetNormalizerModelParameters. #Resolved
// 0.0000, -0.5000, 0.0000, 1.0000 | ||
|
||
// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for function. | ||
// If case if we have multiple column transformations we need to pass index of InputOutputColumnPair. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If case [](start = 15, length = 8)
Remove "If case", ere and in other files. #Resolved
// If case if we have multiple column transformations we need to pass index of InputOutputColumnPair. | ||
var transformParams = (normalizeTransform.GetNormalizerModelParameters(0) as Microsoft.ML.Transforms.NormalizingTransformer.AffineNormalizerModelParameters<ImmutableArray<float>>); | ||
Console.WriteLine($"Values for slot 1 would be transfromed by applying y= (x - ({(transformParams.Offset.Length == 0 ? 0 : transformParams.Offset[1])})) * {transformParams.Scale[1]}"); | ||
// Values for slot 1 would be transfromed by applying y= (x - (-1)) * 0.3333333 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y= [](start = 66, length = 3)
could you add a space after "y"? #Resolved
var normalize = mlContext.Transforms.NormalizeMeanVariance("Features", useCdf: true); | ||
|
||
// NormalizeMeanVariance normalizes the data based on the computed mean and variance of the data. | ||
var normalizeNoCdf = mlContext.Transforms.NormalizeMeanVariance("Features", useCdf: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not repeat what NormalizeMeanVariance
does.
|
||
// NormalizeBinning normalizes the data by constructing equidensity bins and produce output based on | ||
// to which bin original value belong but make sure zero values would remain zero after normalization. | ||
// Helps preserve sparsity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I would not repeat this explanation. Here and also in other samples.
Console.WriteLine("Where Index(x) is index of bin to which x belongs"); | ||
Console.WriteLine($"Bins upper borders are: {string.Join(" ", fixZeroParams.UpperBounds[1])}"); | ||
// Values for slot 1 would be transfromed by applying y = (Index(x) / 2) - 0.5 | ||
// Where Index(x) is index of bin to which x belongs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where Index(x) is index of bin to which x belongs [](start = 15, length = 49)
// Where Index(x) is the index of the bin to which x belongs" #Resolved
var normalize = mlContext.Transforms.NormalizeBinning("Features", maximumBinCount: 4, fixZero: false); | ||
|
||
// NormalizeBinning normalizes the data by constructing equidensity bins and produce output based on | ||
// to which bin original value belong but make sure zero values would remain zero after normalization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate comment...see line 26-27 #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters. | ||
// If we have multiple column transformations we need to pass index of InputOutputColumnPair. | ||
var transformParams = (normalizeTransform.GetNormalizerModelParameters(0) as Microsoft.ML.Transforms.NormalizingTransformer.BinNormalizerModelParameters<ImmutableArray<float>>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Microsoft.ML.Transforms [](start = 89, length = 23)
Can we make it a using statement? This line is getting very long. #Resolved
// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters. | ||
// If we have multiple column transformations we need to pass index of InputOutputColumnPair. | ||
var transformParams = normalizeTransform.GetNormalizerModelParameters(0) as CdfNormalizerModelParameters<ImmutableArray<float>>; | ||
Console.WriteLine($"Values for slot 1 would be transfromed by applying y = 0.5* (1 + ERF((Math.Log(x)- {transformParams.Mean[1]}) / ({transformParams.StandardDeviation[1]} * sqrt(2)))" ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slot 1 [](start = 43, length = 6)
What are slot 0
and slot 1
? In this particular sample this one and below are both slot 1
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters. | ||
// If we have multiple column transformations we need to pass index of InputOutputColumnPair. | ||
var transformParams = normalizeTransform.GetNormalizerModelParameters(0) as AffineNormalizerModelParameters<ImmutableArray<float>>; | ||
Console.WriteLine($"Values for slot 1 would be transfromed by applying y = (x - ({(transformParams.Offset.Length == 0 ? 0 : transformParams.Offset[1])})) * {transformParams.Scale[1]}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slot 1 [](start = 43, length = 6)
see my comments in other sample. #Resolved
|
||
namespace Samples.Dynamic | ||
{ | ||
public class NormalizeBinning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NormalizeBinning [](start = 17, length = 16)
one line comment about what this does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// NormalizeBinning normalizes the data by constructing equidensity bins and produce output based on | ||
// to which bin original value belong but make sure zero values would remain zero after normalization. | ||
// Helps preserve sparsity. | ||
var normalizeFixZero = mlContext.Transforms.NormalizeBinning("Features", maximumBinCount: 4, fixZero: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixZero: true [](start = 105, length = 13)
love the variation on the parameter! #WontFix
// Expected output: | ||
// 1.0000, 0.6667, 1.0000, 0.0000 | ||
// 0.6667, 1.0000, 0.6667, 0.0000 | ||
// 0.3333, 0.3333, 0.3333, 0.0000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.3333 [](start = 24, length = 6)
I'd emphasize this value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// 0.3333, 0.0000, 0.3333, 0.0000 | ||
// 0.0000, -0.3333, 0.0000, 1.0000 | ||
|
||
// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to pass 0, the index of this column in the dataview, as parameter for..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// 0.0000, -0.3333, 0.0000, 1.0000 | ||
|
||
// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters. | ||
// If we have multiple column transformations we need to pass index of InputOutputColumnPair. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple column transformations [](start = 26, length = 31)
multiple columns #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var mlContext = new MLContext(); | ||
var samples = new List<DataPoint>() | ||
{ | ||
new DataPoint(){ Features = new float[4] { 1, 1, 3, 0} }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe no need to have two columns with the same exact values.
/// [!code-csharp[NormalizeMinMax](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeMinMax.cs)] | ||
/// ]]> | ||
/// </format> | ||
/// </example> | ||
public static NormalizingEstimator NormalizeMinMax(this TransformsCatalog catalog, InputOutputColumnPair[] columns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NormalizeMinMax [](start = 43, length = 15)
you're not invoking this API. I would not put the example here. #Resolved
@@ -137,7 +165,7 @@ public static class NormalizationCatalog | |||
/// <example> | |||
/// <format type="text/markdown"> | |||
/// <] | |||
/// [!code-csharp[NormalizeLogMeanVariance](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeLogMeanVariance.cs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[!code-csharp[NormalizeLogMeanVariance] [](start = 12, length = 39)
you're not invoking this API. I would not put the example here. #Resolved
@@ -175,6 +210,13 @@ public static class NormalizationCatalog | |||
/// <param name="maximumExampleCount">Maximum number of examples used to train the normalizer.</param> | |||
/// <param name="fixZero">Whether to map zero to zero, preserving sparsity.</param> | |||
/// <param name="maximumBinCount">Maximum number of bins (power of 2 recommended).</param> | |||
/// <example> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're not invoking this API. I would not put the example here. #Resolved
/// [!code-csharp[NormalizeBinning](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeSupervisedBinning.cs)] | ||
/// ]]> | ||
/// </format> | ||
/// </example> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 21, length = 1)
you're not invoking this API. I would not put the example here. #Resolved
using System.Linq; | ||
using Microsoft.ML; | ||
using Microsoft.ML.Data; | ||
using static Microsoft.ML.Transforms.NormalizingTransformer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, otherwise I need to write Microsoft.ML.Transforms.NormalizingTransformer.BinNormalizerModelParameters which takes too much space
In reply to: 274057369 [](ancestors = 274057369)
// If we have multiple column transformations we need to pass index of InputOutputColumnPair. | ||
var transformParams = normalizeTransform.GetNormalizerModelParameters(0) as BinNormalizerModelParameters<ImmutableArray<float>>; | ||
Console.WriteLine($"Values for slot 0 would be transfromed by applying y = (Index(x) / {transformParams.Density[0]}) - {(transformParams.Offset.Length == 0 ? 0 : transformParams.Offset[0])}"); | ||
Console.WriteLine("Where Index(x) is the index of the bin to which x belongs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please break super-long lines into two lines #Resolved
using System.Linq; | ||
using Microsoft.ML; | ||
using Microsoft.ML.Data; | ||
using static Microsoft.ML.Transforms.NormalizingTransformer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sing static Microsoft.ML.Transforms.NormalizingTransformer; [](start = 1, length = 59)
ditto #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var normalize = mlContext.Transforms.NormalizeBinning("Features", maximumBinCount: 4, fixZero: false); | ||
|
||
// NormalizeBinning normalizes the data by constructing equidensity bins and produce output based on | ||
// to which bin original value belong but make sure zero values would remain zero after normalization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move the details to similar to your other PR
towards #1209