Skip to content

Lockdown HAL Project #2497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 15, 2019
Merged

Conversation

Ivanidzo4ka
Copy link
Contributor

Fixes ##2267

/// <param name="weights">The weights column.</param>
/// <param name="labelColumn">The name of label column.</param>
/// <param name="featureColumn">The name of feature column.</param>
/// <param name="weightColumn">The name of weight column.</param>
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weight [](start = 51, length = 6)

optional! #Closed

/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[ScoreTensorFlowModel](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OrdinaryLeastSquaresWithOptions.cs)]
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScoreTensorFlowModel [](start = 26, length = 20)

oops #Closed

@codecov
Copy link

codecov bot commented Feb 12, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@b2587fa). Click here to learn what that means.
The diff coverage is 86.95%.

@@            Coverage Diff            @@
##             master    #2497   +/-   ##
=========================================
  Coverage          ?   71.45%           
=========================================
  Files             ?      801           
  Lines             ?   141840           
  Branches          ?    16139           
=========================================
  Hits              ?   101352           
  Misses            ?    36019           
  Partials          ?     4469
Flag Coverage Δ
#Debug 71.45% <86.95%> (?)
#production 67.74% <81.25%> (?)
#test 85.52% <100%> (?)


// Leave out 10% of data for testing
var (trainData, testData) = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.1);
// Get DefaultBinaryPipine for adult dataset and append SymSGD to it.
Copy link
Contributor

@zeahmed zeahmed Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultBinaryPipine [](start = 19, length = 19)

typo. #Closed

return reader.Read(dataFile);
}

public static IEstimator<ITransformer> DefaultBinaryPipeline(MLContext mlContext)
Copy link
Contributor

@zeahmed zeahmed Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultBinaryPipeline [](start = 47, length = 21)

DefaultBinaryPipelineForAdultDataSet???

Its confusing, unless I see this method definition here. The usage seems like we are creating binary pipeline in general. #Closed

"occupation", "relationship", "ethnicity", "native-country", "age", "education-num",
"capital-gain", "capital-loss", "hours-per-week"))
// Min-max normalized all the features
.Append(mlContext.Transforms.Normalize("Features"));
Copy link
Contributor

@zeahmed zeahmed Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(mlContext.Transforms.Normalize("Features") [](start = 22, length = 43)

There is no need to normalize OneHotEncoded variables as they already range between 0-1. Here we should only normalize numeric features. #Closed


// Leave out 10% of data for testing
var (trainData, testData) = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.1);
// Get DefaultBinaryPipine for adult dataset and append SymSGD to it.
Copy link
Contributor

@zeahmed zeahmed Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultBinaryPipine [](start = 19, length = 19)

typo. #Closed

public static void Example()
{
// Downloading a regression dataset from github.com/dotnet/machinelearning
// this will create a housing.txt file in the filsystem this code will run
Copy link
Contributor

@zeahmed zeahmed Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code will run [](start = 68, length = 18)

This sentence does not convey anything? #Closed

Copy link
Contributor

@zeahmed zeahmed Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should have some sort of template kind of thing for all the code and comments because most of the code and comments are shared across many different samples. What do you say? maybe tt template?
cc: @sfilipi, @rogancarr, @shmoradims


In reply to: 255788213 [](ancestors = 255788213)

{
public sealed class SymbolicStochasticGradientDescent
{
/// <summary>
Copy link
Member

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// <su [](start = 8, length = 7)

no need for xml style comments #Resolved

// In this examples we will use the adult income dataset. The goal is to predict
// if a person's income is above $50K or not, based on different pieces of information about that person.
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult

Copy link
Member

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd move this comment above the method. #Resolved

namespace Microsoft.ML.Samples.Dynamic
{
public sealed class SymbolicStochasticGradientDescent
{
Copy link
Member

@sfilipi sfilipi Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @shmoradims, @jwood803 so you don't work on the same. #Closed


printHelper(nameof(SamplesUtils.DatasetUtils.SampleVectorOfNumbersData.Features), whitening);

// Features column obtained post-transformation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column [](start = 24, length = 6)

i think we should come up with a convention to print vector columns differently from columns of the dataset.
I'll take a look at what .Net does.
cc @shmoradims, @rogancarr

maybe put in [ ]

//[ -0.394 -0.318 -0.243 -0.168 0.209 0.358 0.433 0.589 0.873 2.047 ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion of adding [...]


In reply to: 255982945 [](ancestors = 255982945)

/// <summary>
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>
/// </summary>
public static void Example()
Copy link
Contributor

@artidoro artidoro Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xml #Resolved

public sealed class SymbolicStochasticGradientDescent
{
/// <summary>
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>
Copy link
Contributor

@artidoro artidoro Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you replace "requires" instead of "require"? #Resolved

@artidoro
Copy link
Contributor

artidoro commented Feb 13, 2019

Should we make this a normal integer, with default value? Otherwise could you explain what the default is in the xml comment? #Resolved


Refers to: src/Microsoft.ML.HalLearners/SymSgdClassificationTrainer.cs:52 in 1527369. [](commit_id = 1527369, deletion_comment = False)

@artidoro
Copy link
Contributor

I see this applies to quite a few instances in this class, and probably also in many others.
Would you suggest to file this as an issue and work on it in a separate PR?


In reply to: 463122337 [](ancestors = 463122337)


Refers to: src/Microsoft.ML.HalLearners/SymSgdClassificationTrainer.cs:52 in 1527369. [](commit_id = 1527369, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor Author

Think about this nullable values as something which can be tuned by algorithm. If it's specified we will use it, and if we not, we will run code to pick best possible value.


In reply to: 463122866 [](ancestors = 463122866,463122337)


Refers to: src/Microsoft.ML.HalLearners/SymSgdClassificationTrainer.cs:52 in 1527369. [](commit_id = 1527369, deletion_comment = False)

/// <summary>
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>
/// </summary>
public static void Example()
Copy link
Contributor

@artidoro artidoro Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the xml? #Resolved

/// <summary>
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>
/// </summary>
public static void Example()
Copy link
Contributor

@artidoro artidoro Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, could you remove the xml? #Resolved

printHelper(nameof(SamplesUtils.DatasetUtils.SampleVectorOfNumbersData.Features), whitening);

// Features column obtained post-transformation.
// -0.979 0.867 1.449 1.236
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 [](start = 23, length = 1)

I think this is a vector column, if so, could you use senja's suggestion of printing the vector inside brackets: [....]

/// <param name="labelColumn">The labelColumn column.</param>
/// <param name="featureColumn">The features column.</param>
/// <param name="weights">The weights column.</param>
/// <param name="labelColumn">The name of label column.</param>
Copy link
Contributor

@artidoro artidoro Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label [](start = 50, length = 5)

Could you add 'the' after of in these xml comments?
I think it sounds better like this: "The name of the label column.", and I think that's how we have done it in other xml doc. #Resolved

@@ -60,9 +74,18 @@ public static class HalLearnersCatalog
/// <param name="catalog">The <see cref="BinaryClassificationCatalog"/>.</param>
/// <param name="labelColumn">The labelColumn column.</param>
Copy link
Contributor

@artidoro artidoro Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labelColumn [](start = 42, length = 11)

Could you make the change you did above here too? Specifying that this is the name of the label column, instead of just saying label column.
#Resolved

@@ -87,9 +89,6 @@ internal OlsLinearRegressionTrainer(IHostEnvironment env, Options options)
protected override RegressionPredictionTransformer<OlsLinearRegressionModelParameters> MakeTransformer(OlsLinearRegressionModelParameters model, DataViewSchema trainSchema)
=> new RegressionPredictionTransformer<OlsLinearRegressionModelParameters>(Host, model, trainSchema, FeatureColumn.Name);

public RegressionPredictionTransformer<OlsLinearRegressionModelParameters> Train(IDataView trainData, IPredictor initialPredictor = null)
=> TrainTransformer(trainData, initPredictor: initialPredictor);
Copy link
Contributor

@artidoro artidoro Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's not used at all.


In reply to: 257348853 [](ancestors = 257348853)

score = float.MaxValue;
editor.Values[i] = score;
}
weights = editor.Commit();
Copy link
Contributor

@artidoro artidoro Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this is lost in the base method? Do we need it? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1850


In reply to: 257350745 [](ancestors = 257350745)

// Setting the seed to a fixed number in this examples to make outputs deterministic.
var mlContext = new MLContext(seed: 0);

// Download and featurize the dataset
Copy link
Contributor

@zeahmed zeahmed Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Download and featurize the dataset [](start = 15, length = 34)

I saw @rogancarr comment regarding ending every comment with period. #Closed

@@ -0,0 +1,39 @@
namespace Microsoft.ML.Samples.Dynamic
{
public sealed class SymbolicStochasticGradientDescent
Copy link
Contributor

@zeahmed zeahmed Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SymbolicStochasticGradientDescent [](start = 24, length = 33)

We also need to decide the fate of the classes in sample project. I see some of the classes are static some are sealed.
cc: @rogancarr, @sfilipi, @shmoradims #Resolved

/// Continue to train <paramref name="initialPredictor"/> on <paramref name="trainData"/> and produce calibrated binary linear model.
/// </summary>
/// <param name="trainData">The training data set.</param>
/// <param name="initialPredictor">Accepts <see cref="LinearBinaryModelParameters"/> to continue training of this model</param>
public BinaryPredictionTransformer<TPredictor> Train(IDataView trainData, TPredictor initialPredictor = null)
Copy link
Contributor

@artidoro artidoro Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDataView [](start = 61, length = 9)

You might need to merge with master. I should have added a doc for this. #Resolved

// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var mlContext = new MLContext(seed: 3);
// Creating a data reader, based on the format of the data
Copy link
Contributor

@zeahmed zeahmed Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline please. #Closed

public sealed class SymbolicStochasticGradientDescent
{

// This example requires installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>
Copy link
Contributor

@zeahmed zeahmed Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addition [](start = 49, length = 8)

addition -> additional

correct it everywhere. #Closed

Console.WriteLine($"{colName} column obtained post-transformation.");
foreach (var row in column)
Console.WriteLine($"{string.Join(" ", row.DenseValues().Select(x => x.ToString("f3")))} ");
};
Copy link
Contributor

@zeahmed zeahmed Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it go to SamplesUtils? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually. Not in this PR


In reply to: 257356248 [](ancestors = 257356248)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.


In reply to: 257356644 [](ancestors = 257356644,257356248)

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

{
public static class SymbolicStochasticGradientDescent
{
// This example requires installation of additional nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 165, length = 4)

period here as well.

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it LGTM. Thanks for doing it.

@Ivanidzo4ka Ivanidzo4ka merged commit 78b161b into dotnet:master Feb 15, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants