Skip to content

Conversion catalog samples #3167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 4, 2019
Merged

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Apr 2, 2019

Towards #1209
Adding and adjusting samples for the Conversions catalog.

@sfilipi sfilipi requested review from singlis and zeahmed April 2, 2019 06:26
@sfilipi sfilipi self-assigned this Apr 2, 2019
@sfilipi sfilipi added the documentation Related to documentation of ML.NET label Apr 2, 2019
@codecov
Copy link

codecov bot commented Apr 2, 2019

Codecov Report

Merging #3167 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3167      +/-   ##
==========================================
- Coverage   72.54%   72.53%   -0.01%     
==========================================
  Files         807      807              
  Lines      144774   144774              
  Branches    16208    16208              
==========================================
- Hits       105022   105011      -11     
- Misses      35338    35348      +10     
- Partials     4414     4415       +1
Flag Coverage Δ
#Debug 72.53% <ø> (-0.01%) ⬇️
#production 68.12% <ø> (-0.01%) ⬇️
#test 88.82% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
...ML.Data/Transforms/ConversionsExtensionsCatalog.cs 44.87% <ø> (ø) ⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0%> (-20.52%) ⬇️
...soft.ML.TestFramework/DataPipe/TestDataPipeBase.cs 73.7% <0%> (-0.34%) ⬇️


public uint Label { get; set; }
}

Copy link
Member

@wschin wschin Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 empty lines? #Resolved

public static class ConvertTypeMultiColumn
{
// The initial data type
private class InputData
Copy link
Member

@wschin wschin Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other places we have C# structures defined after the actual example function. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like that! focus should be on the example.


In reply to: 271401627 [](ancestors = 271401627)

{
public float Converted1 { get; set; }
public float Converted2 { get; set; }

Copy link
Member

@wschin wschin Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to remove this empty line? #Resolved

new InputOutputColumnPair("Converted3", "Feature3"),
new InputOutputColumnPair("Converted4", "Feature4"),

}, DataKind.Single);
Copy link
Member

@wschin wschin Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ is not aligned with }. #Resolved

new InputOutputColumnPair("Converted2", "Feature2"),
new InputOutputColumnPair("Converted3", "Feature3"),
new InputOutputColumnPair("Converted4", "Feature4"),

Copy link
Member

@wschin wschin Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this empty line. #Resolved


}, DataKind.Single);

// Let's train our pipeline, and then apply it to the same data.
Copy link
Member

@wschin wschin Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more detailed description about what this pipeline is doing. For example, input column A will be converted to output column B. #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not adding too many details, because it is hard to maintain as stale names in the comments don't get flagged by anything.


In reply to: 271402918 [](ancestors = 271402918)

{
public static class MapValue
{
class DataPoint
Copy link
Member

@wschin wschin Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class DataPoint
private class DataPoint
``` #Resolved

public int Score { get; set; }
}

class TransformedData : DataPoint
Copy link
Member

@wschin wschin Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class TransformedData : DataPoint
private class TransformedDataPoint : DataPoint
``` #Resolved



/// This example demonstrates the use of the ValueMappingEstimator by mapping strings to other string values, or floats to strings.
/// This is useful to map types to a grouping.
Copy link
Member

@wschin wschin Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does This is useful to map types to a grouping. mean? #Resolved

new DataPoint() { Timeframe = "12-25yrs" , Score = 3 },
new DataPoint() { Timeframe = "0-5yrs" , Score = 4 },
new DataPoint() { Timeframe = "12-25yrs" , Score = 5 },
new DataPoint() { Timeframe = "25+yrs" , Score = 5 },
Copy link
Member

@wschin wschin Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new DataPoint() { Timeframe = "25+yrs" , Score = 5 },
new DataPoint() { Timeframe = "25+yrs" , Score = 5 },
``` #Resolved


/// This example demonstrates the use of the ValueMappingEstimator by mapping strings to other string values, or floats to strings.
/// This is useful to map types to a grouping.
/// It is possible to have multiple values map to the same category.
Copy link
Member

@wschin wschin Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is not very clear to me. #Resolved

namespace Microsoft.ML.Samples.Dynamic
{
// This example illustrates how to convert multiple columns of different types to one type, in this case System.Single.
// This is often a useful data transformation before concatenting the features together and passing them to a particular estimator.
Copy link
Contributor

@zeahmed zeahmed Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concatenting [](start = 57, length = 12)

typo. #Closed


public static void Example()
{
var mlContext = new MLContext(seed: 1);
Copy link
Contributor

@zeahmed zeahmed Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var mlContext = new MLContext(seed: 1); [](start = 12, length = 39)

I see missing comments overall in this sample which we usually have in other samples e.g. comments above the MLContext etc. #Closed

// TransformedData obtained post-transformation.
//
// Timeframe TimeframeCategory Label Score ScoreCategory
// 0 - 4yrs Short 1 1 Low
Copy link
Contributor

@zeahmed zeahmed Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 - 4 [](start = 15, length = 5)

There should not be spaces between hyphen here. #Closed


// Create the lookup map data IEnumerable.
var lookupData = new[] {
new LookupMap { Value = 3.14f, Category = "Low" },
Copy link
Contributor

@zeahmed zeahmed Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value = 3.14f, Category = "Low" [](start = 32, length = 31)

is it intentional that order of Value and Category differ from the rest? #Closed

// Convert to IDataView
var lookupIdvMap = mlContext.Data.LoadFromEnumerable(lookupData);

// Constructs the ValueMappingEstimator making the ML.net pipeline
Copy link
Contributor

@zeahmed zeahmed Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ML.net [](start = 63, length = 6)

I think we advertise it as ML.NET not ML.net, right? #Closed

foreach (var featureRow in features)
Console.WriteLine($"{featureRow.Price}\t\t{featureRow.PriceCategory}");

// TransformedData obtained post-transformation.
Copy link
Contributor

@zeahmed zeahmed Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// TransformedData obtained post-transformation. [](start = 12, length = 48)

Everywhere else this is "// Expected output:" #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine too, it matches the name of the type.


In reply to: 271420199 [](ancestors = 271420199)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.


In reply to: 271466196 [](ancestors = 271466196,271420199)

}

// Timeframe Feature
// 0 - 4yrs 0, 5, 300
Copy link
Contributor

@zeahmed zeahmed Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 - 4yrs [](start = 15, length = 8)

No spaces between hyphen here. #Closed


class TransformedData : DataPoint
{
public int[] Feature { get; set; }
Copy link
Contributor

@zeahmed zeahmed Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature [](start = 25, length = 7)

Should be 'Features" as you mentioned this as Features Column in the comments below, right? #Closed

@@ -100,7 +106,7 @@ internal static TypeConvertingEstimator ConvertType(this TransformsCatalog.Conve
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[KeyToValueMappingEstimator](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/ValueMappingStringToKeyType.cs)]
/// [!code-csharp[KeyToValueMappingEstimator](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Conversion/ValueMappingStringToKeyType.cs)]
Copy link
Contributor

@zeahmed zeahmed Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyToValueMappingEstimator [](start = 27, length = 26)

It should be the name of the extension method e.g. MapKeyToValue. Can you please also check this inconsistency in the rest of the file? #Closed

}

// The resulting data type after the transformation
private sealed class TransformedData : InputData
Copy link

@shmoradims shmoradims Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sealed [](start = 16, length = 6)

we don't need to worry about sealed. it might give the wrong impression that the data classes should be sealed.

I added this note to the checklist #Resolved

public string Feature2;
public DateTime Feature3;
public double Feature4;
}
Copy link

@shmoradims shmoradims Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move the data classes below example as discussed in the group (i also added this note to the checklist) #Resolved

new DataPoint() { Price = 1.19f },
new DataPoint() { Price = 2.17f },
new DataPoint() { Price = 33.784f },

Copy link

@shmoradims shmoradims Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line #Resolved

public string Category { get; set; }

}

Copy link

@shmoradims shmoradims Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line #Resolved


namespace Microsoft.ML.Samples.Dynamic
{
public static class MapValueIdvLookup
Copy link

@shmoradims shmoradims Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MapValueIdvLookup [](start = 24, length = 17)

Let's change the filename to say Idv instead of IDV #Resolved

// 0 23 6.368921E+17 0.1206
// 1 8904 6.368924E+17 8.09

}
Copy link

@shmoradims shmoradims Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see a lot of extra empty lines. please have a second pass and remove them. #Resolved

@sfilipi sfilipi force-pushed the conversionSamples branch from 599a29a to edbb1f0 Compare April 2, 2019 20:29
Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -51,5 +39,13 @@ public static void Example()
// A: False Aconv:0
// A: False Aconv:0
}
private class InputData

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr [](start = 7, length = 3)

need extra line here

{
public bool Survived;
}
private sealed class TransformedData : InputData

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr [](start = 8, length = 2)

ditto

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi merged commit e285889 into dotnet:master Apr 4, 2019
@sfilipi sfilipi deleted the conversionSamples branch April 4, 2019 16:49
sfilipi added a commit to sfilipi/machinelearning-1 that referenced this pull request Apr 9, 2019
* adding a sample for convert MultiColumns. Moving files around.

* Adjust the samples about ValueMapping

* Addressing PR comments
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants