-
Notifications
You must be signed in to change notification settings - Fork 1.9k
ValueMappingEstimator example #2222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This provides an example that demonstrates different ways to use the ValueMappingEstimator. This is part of the original change to add the ValueMappingEstimator to the code base and references dotnet#754.
/// <item>12+yrs -> Cat3</item> | ||
/// </list> | ||
/// </summary> | ||
public static void StringToStringMappingExample(MLContext ml, IDataView trainData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might invert these samples so that each sample is a runnable unit. Have the data loader be the helper function, rather than the example be the helper function. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved these to be a runnable unit. The challenge is the associated internal classes that we use to load the data back in. So here is what I have done: I put a single example in ValueMapping.cs -- this is what the documentation will refer to. Then I added a second file called ValueMappingExtra that contains the remaining functions. Each function is now standalone that you can copy and paste, but you will need to also copy the associated class.
In reply to: 250459000 [](ancestors = 250459000)
"12+yrs" | ||
}; | ||
|
||
var educationValues = new List<string>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what these are, just as you explained what the keys were. #Resolved
"Cat3" | ||
}; | ||
|
||
var pipeline = new ValueMappingEstimator<string, string>(ml, educationKeys, educationValues, ("Education", "EducationCategory")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what this does. #Resolved
"Cat2" | ||
}; | ||
|
||
var pipeline = new ValueMappingEstimator<float, string>(ml, inducedKeys, inducedValues, ("Induced", "InducedCategory")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what this does. #Resolved
new int[] { 42, 32 } | ||
}; | ||
|
||
var pipeline = new ValueMappingEstimator<string, int>(ml, educationKeys, educationValues, ("Education", "EducationCategory")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what this does. #Resolved
/// <item>12+yrs -> 42, 32</item> | ||
/// </list> | ||
/// </summary> | ||
public static void StringToArrayMappingExample(MLContext ml, IDataView trainData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to put each example in its own file so it's easier to copy and paste. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var transformedData = pipeline.Fit(trainData).Transform(trainData); | ||
|
||
// Getting the data of the newly created column as an IEnumerable of SampleInfertDataWithFeatures. | ||
var featuresColumn = ml.CreateEnumerable<SampleInfertDataWithFeatures>(transformedData, reuseRowObject: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
featuresColumn [](start = 16, length = 14)
I find this name a bit confusing (since the data contains multiple columns). Perhaps featureRows? then it would be "foreach (featureRow in featureRows)". #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var pipeline = new ValueMappingEstimator<string, string>(ml, educationKeys, educationValues, ("Education", "EducationCategory")); | ||
|
||
// The transformed data. | ||
var transformedData = pipeline.Fit(trainData).Transform(trainData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var [](start = 12, length = 3)
Write the type explicitly here to make it clearer.
Same comment for line 93. #Resolved
/// level of education as keys to a respective string label which is the value. | ||
/// The mapping looks like the following: | ||
/// <list> | ||
/// <item>0-5yrs -> Cat1</item> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
item [](start = 17, length = 4)
will need to add a tag after item, otherwise they don't render. so:
0-5yrs -> Cat1
#Resolved
- Added further comments based upon feedback
@sfilipi @yaeldekel @rogancarr Thank you for the feedback, I have posted an update, please take a look when you have a chance. #Resolved |
Codecov Report
@@ Coverage Diff @@
## master #2222 +/- ##
==========================================
+ Coverage 71.15% 71.24% +0.08%
==========================================
Files 779 783 +4
Lines 140311 140732 +421
Branches 16047 16086 +39
==========================================
+ Hits 99838 100259 +421
+ Misses 36021 36019 -2
- Partials 4452 4454 +2
|
/// level of education as keys to a respective string label which is the value. | ||
/// The mapping looks like the following: | ||
/// <list> | ||
/// <item>0-5yrs -> Cat1</item> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0-5yrs -> Cat1 [](start = 22, length = 14)
For XML documentation, those need to be wrapped in a tag.
It doesn't matter much in your case, because this will display inside a code block. #Resolved
/// <item>0-5yrs -> Cat1</item> | ||
/// <item>6-11yrs -> Cat2</item> | ||
/// <item>12+yrs -> Cat3</item> | ||
/// </list> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should typically go in a section.
you can use simple, non-xml comments here. #Resolved
var ml = new MLContext(); | ||
|
||
// Get a small dataset as an IEnumerable. | ||
IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> data = SamplesUtils.DatasetUtils.GetInfertData(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IEnumerable<SamplesUtils.DatasetUtils.SampleInfertData> [](start = 12, length = 55)
var #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the feedback was to not use var in this case because its more explicit about what we are doing. #Resolved
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public class ValueMappingExample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueMappingExample [](start = 17, length = 19)
you need to reference your samples from the extensions, otherwise they won't show up anywhere.
For how to link it, see: https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.StandardLearners/StandardLearnersCatalog.cs#L115
The XML of the extensions should have this on it:
///
///
///
///
#Resolved
}; | ||
|
||
// Constructs the ValueMappingEstimator making the ML.net pipeline | ||
var pipeline = new ValueMappingEstimator<string, string>(ml, educationKeys, educationValues, ("EducationCategory", "Education")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new ValueMappingEstimator [](start = 27, length = 25)
i think we want to create those through the mlContext. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes totally agree, I have updated these to now be called from the mlContext. However I was not able to do that for the KeyType example due to the extension methods not taking in the boolean for treatValuesAsKeyTypes. I have created an issue to track this #2346
In reply to: 252779161 [](ancestors = 252779161)
// 35.0 1.0 6-11yrs 1.0 3.0 32.0 5.0 ... | ||
|
||
// Creating a list of keys based on the Education values from the dataset | ||
// These lists are created by hand for the demonstration, but the ValueMappingEstimator does take an IEnumerable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the users might wonder why we are doing this, although it is somewhat explained.
Maybe try adding: if the list of keys and the list of values is known, they can be passed to the API. #Resolved
|
||
// Features column obtained post-transformation. | ||
// | ||
// Age Education EducationLabel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Age Education EducationLabel [](start = 14, length = 32)
I'd expect a column with name "EducationCategory" #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @singlis!
Big comments:
- It's not clear from these samples why anybody would actually use these transforms in practice. Can you please give examples and explanations of why these would be used?
- Can you add links to these samples from the catalog entries? That will ready them for v1.
- Have you seen the
KeyToValue_Term.cs
Dynamic Sample? This seems obsolete with your examples, so maybe remove it.
{ | ||
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging, | ||
// as well as the source of randomness. | ||
var ml = new MLContext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mlContext
#Resolved
public string EducationCategory = default; | ||
} | ||
|
||
///<summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sample is missing the "So what?" aspect. Why would I use this transform, and where is it helpful?
Can you give an example of a problem you can't solve without using this? #Resolved
// 42 0-5yrs Cat1 | ||
// 39 12+yrs Cat3 | ||
// 34 0-5yrs Cat1 | ||
// 35 6-11yrs Cat2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment above, what is the use of changing it from a string to a different string? Can you show what this can then be used for? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a remarks section and added additional verbage to each example.
In reply to: 252778545 [](ancestors = 252778545)
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public class ValueMappingFloatToStringExample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above — what problem does this solve? #Resolved
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public class ValueMappingStringToArrayExample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above — what problem does this solve? #Resolved
{ | ||
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging, | ||
// as well as the source of randomness. | ||
var ml = new MLContext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mlContext #Resolved
}; | ||
|
||
// Constructs the ValueMappingEstimator making the ML.net pipeline | ||
var pipeline = new ValueMappingEstimator<string, int>(ml, educationKeys, educationValues, ("EducationCategory", "Education")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call this from mlContext
? #Resolved
// The KeyToValueMappingEstimator is added to provide a reverse lookup of the KeyType, converting the KeyType value back | ||
// to the original value. | ||
var pipeline = new ValueMappingEstimator<string, string>(ml, educationKeys, educationValues, true, ("EducationKeyType", "Education")) | ||
.Append(new KeyToValueMappingEstimator(ml, ("EducationCategory", "EducationKeyType"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use mlContext
. #Resolved
}; | ||
|
||
// Constructs the ValueMappingEstimator making the ML.net pipeline | ||
var pipeline = new ValueMappingEstimator<float, string>(ml, inducedKeys, inducedValues, ("InducedCategory", "Induced")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call this from mlContext
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be updated -- there are two cases that I can not call from mlContext. I have created issues for that.
In reply to: 252782089 [](ancestors = 252782089)
}; | ||
|
||
// Constructs the ValueMappingEstimator making the ML.net pipeline | ||
var pipeline = new ValueMappingEstimator<string, string>(ml, educationKeys, educationValues, ("EducationCategory", "Education")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call this from mlContext
? #Resolved
}; | ||
|
||
// Constructs the ValueMappingEstimator making the ML.net pipeline | ||
var pipeline = new ValueMappingEstimator<string, int>(ml, educationKeys, educationValues, ("EducationCategory", "Education")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r<string, int> [](start = 51, length = 14)
can we merge this sample with the other one? maybe have a pipeline with the two estimators?
can we merge all this samples into one, with a pipeline with more than one estimator? #ByDesign
- Added remarks section to ValueMapping - Other updates based upon feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with comments.
IDataView trainData = mlContext.Data.ReadFromEnumerable(data); | ||
|
||
// If the list of keys and values are known, they can be passed to the API. The ValueMappingEstimator can also get the mapping through an IDataView | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Empty line #Resolved
} | ||
|
||
/// This example demonstrates the use of the ValueMappingEstimator by mapping string-to-array values which allows for mapping string data | ||
/// to numeric arrays that can then be used as a feature set for a trainer. In this example, we are mapping the education data to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a feature set for a trainer
Does this work? The vectors are of different sizes. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No probably not to pass into a trainer. I went ahead and made them the same size.
In reply to: 252881242 [](ancestors = 252881242)
/// instead of the actual value provides a unique integer representation of the value. When the treatValueAsKeyTypes is true, | ||
/// the ValueMappingEstimator will generate a KeyType for each unique value. | ||
/// | ||
/// In this example, the education data is mapped to a grouping of 'Undergraudate' and 'Postgraduate'. Because KeyTypes are used, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undergraduate #Resolved
/// The ValueMappingEstimator is a 1-1 mapping from a key to value. This particular class load the mappings from an <see cref="IDataView"/>. | ||
/// This gives user the flexibility to load the mapping from file instead of using IEnumerable in <see cref="ValueMappingEstimator{TKey, TValue}"/> | ||
/// </summary> | ||
/// <include file='doc.xml' path='doc/members/member[@name="ValueMappingEstimator"]/*' /> | ||
public class ValueMappingEstimator : TrivialEstimator<ValueMappingTransformer> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueMappingEstimator [](start = 17, length = 21)
i'd put it in the transformer too. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the comments for the estimator on the transformer? Or create a new entry for the transformer?
In reply to: 252889880 [](ancestors = 252889880)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specific value. The ValueMappingEstimator supports keys and values of different <see cref="System.Type"/> to support different data types. | ||
Examples for using a ValueMappingEstimator are: | ||
<list> | ||
<li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
li [](start = 11, length = 2)
item #Resolved
} | ||
|
||
/// This example demonstrates the use of the ValueMappingEstimator by mapping string-to-string values. This is useful | ||
/// to map strings to a grouping. In this example, the Education data maps to the groups Undergraduate and Postgraduate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grouping [](start = 32, length = 8)
shall we use the term 'key' #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you wanting it to say "map keys to a grouping"?
In reply to: 252890528 [](ancestors = 252890528)
// Generate the ValueMappingEstimator that will output KeyTypes even though our values are strings. | ||
// The KeyToValueMappingEstimator is added to provide a reverse lookup of the KeyType, converting the KeyType value back | ||
// to the original value. | ||
var pipeline = new ValueMappingEstimator<string, string>(mlContext, educationKeys, educationValues, true, ("EducationKeyType", "Education")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new ValueMappingEstimator< [](start = 27, length = 26)
use mlContext #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to add an api to the extensions catalog to allow for the treatValuesAsKeyTypes. I dont want to add that in this PR as I am trying to scope this to documentation changes. I have a github issue tracking it and update this sample once its fixed
In reply to: 252890986 [](ancestors = 252890986)
}; | ||
|
||
// Constructs the ValueMappingEstimator making the ML.net pipeline | ||
var pipeline = new ValueMappingEstimator<string, int>(mlContext, educationKeys, educationValues, ("EducationFeature", "Education")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new ValueMappingEstimator [](start = 27, length = 25)
mlContext #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same deal here -- need to add an api to support vector types. I have a github issue to fix.
In reply to: 252891208 [](ancestors = 252891208)
IDataView transformedData = pipeline.Fit(trainData).Transform(trainData); | ||
|
||
// Getting the resulting data as an IEnumerable of SampleInfertDataWithIntArray. This will contain the newly created column EducationCategory | ||
IEnumerable<SampleInfertDataWithIntArray> featuresColumn = mlContext.CreateEnumerable<SampleInfertDataWithIntArray>(transformedData, reuseRowObject: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IEnumerable<SampleInfertDataWithIntArray [](start = 12, length = 40)
nit;: var #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yael's feedback was to keep this explicit for demo purposes. What would you prefer?
In reply to: 252891247 [](ancestors = 252891247)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public class ValueMappingStringToArrayExample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueMappingStringToArrayExample [](start = 17, length = 32)
this is not referenced anywhere. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RIght - this is just another example that is available to show how this can be used. There is another example that is not used either. Does they all need to be referenced?
In reply to: 252891445 [](ancestors = 252891445)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove them. The purpose of this project is to populate the API reference website.
the machinelearning-samples repo is the one where we can put more samples, if needed.
If we keep them here, unreferenced, it will be hard to see what is used for what, overtime.
In reply to: 252938208 [](ancestors = 252938208,252891445)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add multiple links? I initially had these in one file but moved out into separate files to not have to update line number reference.
In reply to: 252938844 [](ancestors = 252938844,252938208,252891445)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added multiple links so they are all referenced now.
In reply to: 252939988 [](ancestors = 252939988,252938844,252938208,252891445)
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public class ValueMappingFloatToStringExample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueMappingFloatToStringExample [](start = 17, length = 32)
this is not referenced from anything. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I added a see cref in the summary tag but not sure how that will render. In reply to: 459553687 [](ancestors = 459553687) Refers to: src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs:139 in b3d1df2. [](commit_id = b3d1df2, deletion_comment = False) |
This provides an example that demonstrates different ways to use the
ValueMappingEstimator. This is part of the original change to add the
ValueMappingEstimator to the code base and references #754.