-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Lockdown HAL Project #2497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lockdown HAL Project #2497
Conversation
/// <param name="weights">The weights column.</param> | ||
/// <param name="labelColumn">The name of label column.</param> | ||
/// <param name="featureColumn">The name of feature column.</param> | ||
/// <param name="weightColumn">The name of weight column.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weight [](start = 51, length = 6)
optional! #Closed
/// <example> | ||
/// <format type="text/markdown"> | ||
/// <] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScoreTensorFlowModel [](start = 26, length = 20)
oops #Closed
Codecov Report
@@ Coverage Diff @@
## master #2497 +/- ##
=========================================
Coverage ? 71.45%
=========================================
Files ? 801
Lines ? 141840
Branches ? 16139
=========================================
Hits ? 101352
Misses ? 36019
Partials ? 4469
|
|
||
// Leave out 10% of data for testing | ||
var (trainData, testData) = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.1); | ||
// Get DefaultBinaryPipine for adult dataset and append SymSGD to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultBinaryPipine [](start = 19, length = 19)
typo. #Closed
return reader.Read(dataFile); | ||
} | ||
|
||
public static IEstimator<ITransformer> DefaultBinaryPipeline(MLContext mlContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultBinaryPipeline [](start = 47, length = 21)
DefaultBinaryPipelineForAdultDataSet???
Its confusing, unless I see this method definition here. The usage seems like we are creating binary pipeline in general. #Closed
"occupation", "relationship", "ethnicity", "native-country", "age", "education-num", | ||
"capital-gain", "capital-loss", "hours-per-week")) | ||
// Min-max normalized all the features | ||
.Append(mlContext.Transforms.Normalize("Features")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(mlContext.Transforms.Normalize("Features") [](start = 22, length = 43)
There is no need to normalize OneHotEncoded variables as they already range between 0-1. Here we should only normalize numeric features. #Closed
|
||
// Leave out 10% of data for testing | ||
var (trainData, testData) = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.1); | ||
// Get DefaultBinaryPipine for adult dataset and append SymSGD to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultBinaryPipine [](start = 19, length = 19)
typo. #Closed
public static void Example() | ||
{ | ||
// Downloading a regression dataset from github.com/dotnet/machinelearning | ||
// this will create a housing.txt file in the filsystem this code will run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code will run [](start = 68, length = 18)
This sentence does not convey anything? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should have some sort of template kind of thing for all the code and comments because most of the code and comments are shared across many different samples. What do you say? maybe tt template?
cc: @sfilipi, @rogancarr, @shmoradims
In reply to: 255788213 [](ancestors = 255788213)
{ | ||
public sealed class SymbolicStochasticGradientDescent | ||
{ | ||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <su [](start = 8, length = 7)
no need for xml style comments #Resolved
// In this examples we will use the adult income dataset. The goal is to predict | ||
// if a person's income is above $50K or not, based on different pieces of information about that person. | ||
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd move this comment above the method. #Resolved
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public sealed class SymbolicStochasticGradientDescent | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @shmoradims, @jwood803 so you don't work on the same. #Closed
|
||
printHelper(nameof(SamplesUtils.DatasetUtils.SampleVectorOfNumbersData.Features), whitening); | ||
|
||
// Features column obtained post-transformation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column [](start = 24, length = 6)
i think we should come up with a convention to print vector columns differently from columns of the dataset.
I'll take a look at what .Net does.
cc @shmoradims, @rogancarr
maybe put in [ ]
//[ -0.394 -0.318 -0.243 -0.168 0.209 0.358 0.433 0.589 0.873 2.047 ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <summary> | ||
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a> | ||
/// </summary> | ||
public static void Example() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Xml #Resolved
public sealed class SymbolicStochasticGradientDescent | ||
{ | ||
/// <summary> | ||
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace "requires" instead of "require"? #Resolved
I see this applies to quite a few instances in this class, and probably also in many others. In reply to: 463122337 [](ancestors = 463122337) Refers to: src/Microsoft.ML.HalLearners/SymSgdClassificationTrainer.cs:52 in 1527369. [](commit_id = 1527369, deletion_comment = False) |
Think about this nullable values as something which can be tuned by algorithm. If it's specified we will use it, and if we not, we will run code to pick best possible value. In reply to: 463122866 [](ancestors = 463122866,463122337) Refers to: src/Microsoft.ML.HalLearners/SymSgdClassificationTrainer.cs:52 in 1527369. [](commit_id = 1527369, deletion_comment = False) |
/// <summary> | ||
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a> | ||
/// </summary> | ||
public static void Example() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the xml? #Resolved
/// <summary> | ||
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a> | ||
/// </summary> | ||
public static void Example() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too, could you remove the xml? #Resolved
printHelper(nameof(SamplesUtils.DatasetUtils.SampleVectorOfNumbersData.Features), whitening); | ||
|
||
// Features column obtained post-transformation. | ||
// -0.979 0.867 1.449 1.236 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 [](start = 23, length = 1)
I think this is a vector column, if so, could you use senja's suggestion of printing the vector inside brackets: [....]
/// <param name="labelColumn">The labelColumn column.</param> | ||
/// <param name="featureColumn">The features column.</param> | ||
/// <param name="weights">The weights column.</param> | ||
/// <param name="labelColumn">The name of label column.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label [](start = 50, length = 5)
Could you add 'the' after of in these xml comments?
I think it sounds better like this: "The name of the label column.", and I think that's how we have done it in other xml doc. #Resolved
@@ -60,9 +74,18 @@ public static class HalLearnersCatalog | |||
/// <param name="catalog">The <see cref="BinaryClassificationCatalog"/>.</param> | |||
/// <param name="labelColumn">The labelColumn column.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labelColumn [](start = 42, length = 11)
Could you make the change you did above here too? Specifying that this is the name of the label column, instead of just saying label column.
#Resolved
@@ -87,9 +89,6 @@ internal OlsLinearRegressionTrainer(IHostEnvironment env, Options options) | |||
protected override RegressionPredictionTransformer<OlsLinearRegressionModelParameters> MakeTransformer(OlsLinearRegressionModelParameters model, DataViewSchema trainSchema) | |||
=> new RegressionPredictionTransformer<OlsLinearRegressionModelParameters>(Host, model, trainSchema, FeatureColumn.Name); | |||
|
|||
public RegressionPredictionTransformer<OlsLinearRegressionModelParameters> Train(IDataView trainData, IPredictor initialPredictor = null) | |||
=> TrainTransformer(trainData, initPredictor: initialPredictor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing this? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
score = float.MaxValue; | ||
editor.Values[i] = score; | ||
} | ||
weights = editor.Commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this is lost in the base method? Do we need it? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Setting the seed to a fixed number in this examples to make outputs deterministic. | ||
var mlContext = new MLContext(seed: 0); | ||
|
||
// Download and featurize the dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Download and featurize the dataset [](start = 15, length = 34)
I saw @rogancarr comment regarding ending every comment with period. #Closed
@@ -0,0 +1,39 @@ | |||
namespace Microsoft.ML.Samples.Dynamic | |||
{ | |||
public sealed class SymbolicStochasticGradientDescent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SymbolicStochasticGradientDescent [](start = 24, length = 33)
We also need to decide the fate of the classes in sample project. I see some of the classes are static some are sealed.
cc: @rogancarr, @sfilipi, @shmoradims #Resolved
/// Continue to train <paramref name="initialPredictor"/> on <paramref name="trainData"/> and produce calibrated binary linear model. | ||
/// </summary> | ||
/// <param name="trainData">The training data set.</param> | ||
/// <param name="initialPredictor">Accepts <see cref="LinearBinaryModelParameters"/> to continue training of this model</param> | ||
public BinaryPredictionTransformer<TPredictor> Train(IDataView trainData, TPredictor initialPredictor = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDataView [](start = 61, length = 9)
You might need to merge with master. I should have added a doc for this. #Resolved
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging, | ||
// as well as the source of randomness. | ||
var mlContext = new MLContext(seed: 3); | ||
// Creating a data reader, based on the format of the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline please. #Closed
public sealed class SymbolicStochasticGradientDescent | ||
{ | ||
|
||
// This example requires installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addition [](start = 49, length = 8)
addition -> additional
correct it everywhere. #Closed
Console.WriteLine($"{colName} column obtained post-transformation."); | ||
foreach (var row in column) | ||
Console.WriteLine($"{string.Join(" ", row.DenseValues().Select(x => x.ToString("f3")))} "); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it go to SamplesUtils? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
public static class SymbolicStochasticGradientDescent | ||
{ | ||
// This example requires installation of additional nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 165, length = 4)
period here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it LGTM. Thanks for doing it.
Fixes ##2267