Skip to content

Key to binary samples for documentation #3211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 9, 2019
Merged

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Apr 5, 2019

Towards #1209
This adds the last batch of samples for the Conversions catalog.
The samples are KeyToVector and KeyToBinaryVector

@sfilipi sfilipi added the documentation Related to documentation of ML.NET label Apr 5, 2019
/// This example demonstrates the use of MapKeyToVector by mapping keys to floats[] for multiple columns at once.
/// Because the ML.NET KeyType maps the missing value to zero, counting of uints starts at 1, so the values
/// converted to KeyTypes will appear skewed by one. See https://github.com/dotnet/machinelearning/issues/3072
public static void Example()
Copy link
Member Author

@sfilipi sfilipi Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have tried to add explanations, because of issue 3072.
I think we should change the mapping of the missing value to 0... #WontFix

@codecov
Copy link

codecov bot commented Apr 5, 2019

Codecov Report

Merging #3211 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3211      +/-   ##
==========================================
+ Coverage   72.62%   72.62%   +<.01%     
==========================================
  Files         807      807              
  Lines      145080   145080              
  Branches    16213    16213              
==========================================
+ Hits       105369   105370       +1     
  Misses      35294    35294              
+ Partials     4417     4416       -1
Flag Coverage Δ
#Debug 72.62% <ø> (ø) ⬆️
#production 68.17% <ø> (ø) ⬆️
#test 88.92% <ø> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Data/Data/SchemaDefinition.cs 69.87% <ø> (ø) ⬆️
...ML.Data/Transforms/ConversionsExtensionsCatalog.cs 64.07% <ø> (ø) ⬆️
src/Microsoft.ML.Transforms/ConversionsCatalog.cs 83.33% <ø> (ø) ⬆️
...soft.ML.Transforms/Text/WordEmbeddingsExtractor.cs 87.52% <0%> (-0.91%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.26% <0%> (+0.15%) ⬆️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.9% <0%> (+0.2%) ⬆️
src/Microsoft.ML.Maml/MAML.cs 26.21% <0%> (+1.45%) ⬆️

@sfilipi
Copy link
Member Author

sfilipi commented Apr 5, 2019

Referencing #3072 since the previews on those samples illustrate what might be a source of confusion. #WontFix

// TransformedData obtained post-transformation.
//
// Timeframe TimeframeVector Category CategoryVector
// 10 0,0,0,0,0,0,0,0,0,1 6 0,0,0,0,0
Copy link
Member

@wschin wschin Apr 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? The first Timeframe was 9 in your declaration. If no transformer is applied to Timeframe, ML.NET should not touch its value? Is my understanding correct? Or I am missing something? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.... this is why i logged 3072, and had a discussion on Friday post scrum. Gets easier to think of them as strings, not numbers.

the string "0", gets mapped to the first available ulong, 1. (The ulong 0 is reserved for missing key).
The string "1" gets mapped to the next available : 2.
On our KeyType system counting starts from 1. 0 is reserved.


In reply to: 272774353 [](ancestors = 272774353)

@rogancarr
Copy link
Contributor

rogancarr commented Apr 8, 2019

    internal static KeyToBinaryVectorMappingEstimator MapKeyToBinaryVector(this TransformsCatalog.ConversionTransforms catalog,

Link to Sample? #Resolved


Refers to: src/Microsoft.ML.Transforms/ConversionsCatalog.cs:23 in 9db4dfe. [](commit_id = 9db4dfe, deletion_comment = False)


Console.WriteLine($" Timeframe TimeframeVector Category CategoryVector");
foreach (var featureRow in features)
Console.WriteLine($"{featureRow.Timeframe}\t\t\t{string.Join(',', featureRow.TimeframeVector)}\t\t\t{featureRow.Category}\t\t{string.Join(',', featureRow.CategoryVector)}");
Copy link
Contributor

@rogancarr rogancarr Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

($" [](start = 33, length = 3)

For this long string, consider doing the old-style "{0} {1}", thingOne, thingTwo etc.

As it is, it's hard to read. #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the tabs instead.


In reply to: 273242076 [](ancestors = 273242076)

Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a few small comments.


// Constructs the ML.net pipeline
var pipeline = mlContext.Transforms.Conversion.MapKeyToVector(
new[]{
Copy link
Contributor

@artidoro artidoro Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new[]{ [](start = 16, length = 6)

nit: maybe keep this ("new[]{") on the previous line, or indent the next two. #Resolved

/// Marks member as <see cref="KeyDataViewType"/> and specifies <see cref="KeyDataViewType"/> cardinality.
/// Marks member as <see cref="KeyDataViewType"/>. The <paramref name="count"/> should be set to
/// one more than the maximum value for the keys (to account for missing values).
/// If the values are outside of the specified cardinality they will be mapped to the missing value representation: 0.
/// </summary>
Copy link
Contributor

@artidoro artidoro Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the old description was correct, especially when the underlying values are not numbers. When that's the case, I think it might make more sense to think about cardinality instead of maximum value of keys. #Resolved

Copy link
Contributor

@artidoro artidoro Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe what you have written could be added as a note for the special case of integer values?


In reply to: 273266746 [](ancestors = 273266746)

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi
Copy link
Member Author

sfilipi commented Apr 9, 2019

    internal static KeyToBinaryVectorMappingEstimator MapKeyToBinaryVector(this TransformsCatalog.ConversionTransforms catalog,

it is internal, it does have one yet.


In reply to: 481010536 [](ancestors = 481010536)


Refers to: src/Microsoft.ML.Transforms/ConversionsCatalog.cs:23 in 9db4dfe. [](commit_id = 9db4dfe, deletion_comment = False)

@sfilipi sfilipi merged commit ad99bc7 into dotnet:master Apr 9, 2019
@sfilipi sfilipi deleted the keyToBinary branch April 9, 2019 20:17
sfilipi added a commit to sfilipi/machinelearning-1 that referenced this pull request Apr 9, 2019
* adding samples for KeyToVector and KeyToBinaryVector

* typo

* Adding pointers to the KeyType.

* PR review comments
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants